Image Compression

correlation in wavelet domain. In: 2007 IEEE International Conference on Image Processing, Vol. 1. IEEE; 2007. p. I-445

[34] Granados M, Aydn TO, Tena JR, Lalonde JF, Theobalt C. HDR image noise estimation for denoising tone mapped images. In: Proceedings of the 12th European Conference on Visual Media Production. ACM; 2015. p. 7

[35] Perry S. Image and video noise: An industry perspective. In: Denoising of Photographic Images and Video. Cham:

[36] Milanfar P. A tour of modern image filtering: New insights and methods, both practical and theoretical. IEEE Signal Processing Magazine. 2012;30(1):

Springer; 2018. pp. 207-234

106-128

Coding Theory

158

**161**

**Chapter 10**

**Abstract**

systems, Zerotree wavelet

tal, vertical and diagonal).

**1. Introduction**

compression [1].

Encoder

Many-Core Algorithm of the

Embedded Zerotree Wavelet

*Jesús Antonio Alvarez-Cedillo, Teodoro Alvarez-Sanchez,* 

*Mario Aguilar-Fernandez and Jacobo Sandoval-Gutierrez*

In the literature, the image compression was implemented using a variety of algorithms; such as vector quantization and subband coding and transform-based schemes. The current problem is that the selection of an image compression algorithm depends on criteria of compression ratio, but the quality of reconstructed images depends on the technology used. Some papers about of the wavelet transform-based coding show this field as an emerging option for image compression with high coding efficiency. It is well known that the new wavelet-based image compression scheme JPEG-2000 has been standardized. This chapter shows the developed novel algorithm executed in parallel using the embedded Zerotree wavelet coding scheme, in which the programs integrate parallelism techniques to

be implemented and executed in the many-core system Epiphany III.

two-channel wavelet filter bank in a recursive process [2].

**Keywords:** image compression, vector quantization, subband coding, embedded

Wavelet transform consists in uses a compact multi-resolution representation of an image, allows uses the energy compaction to exploit redundancy and to achieve

The discrete wavelet transform (DWT) usually was used in the literature using a

A two-dimensional image of the DWT type is usually calculated using a separable approach, this consists of scanning the input image in a horizontal direction, and passing it through the decomposition filters passes low and passes high [3]. All the selected data are subsampled vertically in order to classify the low frequency and highfrequency data in the horizontal direction. The result produces output data which is scanned vertically; the filters are repeated to generate characteristic frequency subbands [4]. After the subsampling stage, the transformation generates four LL, LH, HL and HH subbands, where each image represents 25% of the original image size [5–8]. In this particular case, the energy was concentrated in the low-frequency LL subband, so it represents a low-resolution version of the original image. The most frequent subbands contain very detailed information in three directions (horizon-

#### **Chapter 10**

## Many-Core Algorithm of the Embedded Zerotree Wavelet Encoder

*Jesús Antonio Alvarez-Cedillo, Teodoro Alvarez-Sanchez, Mario Aguilar-Fernandez and Jacobo Sandoval-Gutierrez*

#### **Abstract**

In the literature, the image compression was implemented using a variety of algorithms; such as vector quantization and subband coding and transform-based schemes. The current problem is that the selection of an image compression algorithm depends on criteria of compression ratio, but the quality of reconstructed images depends on the technology used. Some papers about of the wavelet transform-based coding show this field as an emerging option for image compression with high coding efficiency. It is well known that the new wavelet-based image compression scheme JPEG-2000 has been standardized. This chapter shows the developed novel algorithm executed in parallel using the embedded Zerotree wavelet coding scheme, in which the programs integrate parallelism techniques to be implemented and executed in the many-core system Epiphany III.

**Keywords:** image compression, vector quantization, subband coding, embedded systems, Zerotree wavelet

#### **1. Introduction**

Wavelet transform consists in uses a compact multi-resolution representation of an image, allows uses the energy compaction to exploit redundancy and to achieve compression [1].

The discrete wavelet transform (DWT) usually was used in the literature using a two-channel wavelet filter bank in a recursive process [2].

A two-dimensional image of the DWT type is usually calculated using a separable approach, this consists of scanning the input image in a horizontal direction, and passing it through the decomposition filters passes low and passes high [3]. All the selected data are subsampled vertically in order to classify the low frequency and highfrequency data in the horizontal direction. The result produces output data which is scanned vertically; the filters are repeated to generate characteristic frequency subbands [4]. After the subsampling stage, the transformation generates four LL, LH, HL and HH subbands, where each image represents 25% of the original image size [5–8].

In this particular case, the energy was concentrated in the low-frequency LL subband, so it represents a low-resolution version of the original image. The most frequent subbands contain very detailed information in three directions (horizontal, vertical and diagonal).


**Figure 1.**

*(a) Wavelet decomposition in subbands and (b) higher and lower level coefficients for an original image.*

At the end of the process, the image was decomposed by applying the 2-D DWT algorithm to the LL subband [9]. With this iterative process, multiple levels of transformation will be generated where the energy is fully compacted and represented with few low-frequency coefficients [10]. The above can be seen in **Figure 1(a)** and **(b)**. **Figure 1** shows an example of decomposition of three levels of an image using the wavelet transformation, and both images defined the parent/ child relationship between levels.

#### **2. Many-core technology**

Epiphany III is a low-cost embedded system formed with the main memory and 16 cores distributed in a mesh, as shown in **Figure 2**. The system is characterized by having low energy consumption and a high level of parallelism, concurrency and high computing power. All these features and in combination, allow data to be processed at different levels of software and hardware, thereby performing operations at each core [11, 12].

A multi-core system needs a shared memory space that consists of 232 bytes for this system. The memory addresses for access are accessed as unsigned numbers, from 0 to 232, together, they represent 230 of 32-bit words in which any core has access concurrently through a 2D mesh type topology [13]. The use of this interconnection topology avoids overloading or blocking access to shared memory that is reported in the literature as a factor that affects the high-performance system [14–17].

**163**

steps:

*Many-Core Algorithm of the Embedded Zerotree Wavelet Encoder*

This system has an energy consumption of 1 W per 50 Gigaflops in simple precision calculations; these cores measure 65 nm. This technology is RISC and can be run at a speed of 1 GHz, and the 16 cores have the same architecture [11, 14].

With this memory capacity in the Epiphany III system, it is possible to process

A wavelet represents a waveform of limited duration that has an average value close to zero. A wavelet transform modifies a signal from the time domain to the whole time scale domain. Wavelet coefficients are two-dimensional, given this, an image can be represented using trees due to the subsampling that is performed in

Fourier analysis allows dividing a signal into sine waves of various frequencies, due to this the wavelet analysis is the breaking of a signal in displaced and scaled

In zerotree coding, each wavelet coefficient of an arbitrary scale can be related to

Zerotree root (ZTR) represents a low-scale zero value coefficient for which all larger scale coefficients have that same value. By specifying a ZTR, the encoder can

The EZW encoder is a particular type of encoder used to encode image or sound signals of any dimension; it offers the advantages found in denoising algorithms. When using the wavelet transform on an input image, the embedded zerotree encoding (EZW) will allow the encoder to quantify the coefficients using a binary encoding to create a representation of the image [18]. EZW uses the direct relationship between the upper and lower level coefficients (parents and children) to obtain

In order to perform the EZW encoding, it is necessary to perform the following

STEP 1: Determine the initial threshold using bit plane coding, where the subsequent iterations, the threshold (Ti) is reduced by half, and the coefficients <2Ti are

a.Dominant pass: Contains the coordinates of those non-significant coefficients.

b.Subordinate pass: Contains the magnitudes of the significant coefficients for

In the dominant step, the magnitude of the wavelet coefficients is compared to an arbitrary threshold value; the essential data is determined, and the coefficients

an image of (128 × 128 pixels) that is equivalent to 16 kB of memory.

*DOI: http://dx.doi.org/10.5772/intechopen.89300*

**3. Embedded Zerotree coding**

versions of the original or mother wavelet.

maximum coding efficiency [19, 20].

only encoded in each flow.

• Dominant pass.

• Subordinate pass.

1.Key stage.

2.Low pass.

a set of coefficients in the next more excellent scale.

track and reset all related coefficients on a larger scale.

EZW involves two passes, as a recursive process:

each threshold, involves the use of two passes:

STEP 2: Two individual lists are defined:

the transformation.

**Figure 2.** *Components of the Epiphany III architecture [11].*

This system has an energy consumption of 1 W per 50 Gigaflops in simple precision calculations; these cores measure 65 nm. This technology is RISC and can be run at a speed of 1 GHz, and the 16 cores have the same architecture [11, 14].

With this memory capacity in the Epiphany III system, it is possible to process an image of (128 × 128 pixels) that is equivalent to 16 kB of memory.

#### **3. Embedded Zerotree coding**

*Coding Theory*

**Figure 1.**

child relationship between levels.

**2. Many-core technology**

tions at each core [11, 12].

At the end of the process, the image was decomposed by applying the 2-D DWT algorithm to the LL subband [9]. With this iterative process, multiple levels of transformation will be generated where the energy is fully compacted and represented with few low-frequency coefficients [10]. The above can be seen in **Figure 1(a)** and **(b)**. **Figure 1** shows an example of decomposition of three levels of an image using the wavelet transformation, and both images defined the parent/

*(a) Wavelet decomposition in subbands and (b) higher and lower level coefficients for an original image.*

Epiphany III is a low-cost embedded system formed with the main memory and 16 cores distributed in a mesh, as shown in **Figure 2**. The system is characterized by having low energy consumption and a high level of parallelism, concurrency and high computing power. All these features and in combination, allow data to be processed at different levels of software and hardware, thereby performing opera-

A multi-core system needs a shared memory space that consists of 232 bytes for this system. The memory addresses for access are accessed as unsigned numbers, from 0 to 232, together, they represent 230 of 32-bit words in which any core has access concurrently through a 2D mesh type topology [13]. The use of this interconnection topology avoids overloading or blocking access to shared memory that is reported in the literature as a factor that affects the high-performance system [14–17].

**162**

**Figure 2.**

*Components of the Epiphany III architecture [11].*

A wavelet represents a waveform of limited duration that has an average value close to zero. A wavelet transform modifies a signal from the time domain to the whole time scale domain. Wavelet coefficients are two-dimensional, given this, an image can be represented using trees due to the subsampling that is performed in the transformation.

Fourier analysis allows dividing a signal into sine waves of various frequencies, due to this the wavelet analysis is the breaking of a signal in displaced and scaled versions of the original or mother wavelet.

In zerotree coding, each wavelet coefficient of an arbitrary scale can be related to a set of coefficients in the next more excellent scale.

Zerotree root (ZTR) represents a low-scale zero value coefficient for which all larger scale coefficients have that same value. By specifying a ZTR, the encoder can track and reset all related coefficients on a larger scale.

The EZW encoder is a particular type of encoder used to encode image or sound signals of any dimension; it offers the advantages found in denoising algorithms.

When using the wavelet transform on an input image, the embedded zerotree encoding (EZW) will allow the encoder to quantify the coefficients using a binary encoding to create a representation of the image [18]. EZW uses the direct relationship between the upper and lower level coefficients (parents and children) to obtain maximum coding efficiency [19, 20].

In order to perform the EZW encoding, it is necessary to perform the following steps:

STEP 1: Determine the initial threshold using bit plane coding, where the subsequent iterations, the threshold (Ti) is reduced by half, and the coefficients <2Ti are only encoded in each flow.

EZW involves two passes, as a recursive process:


STEP 2: Two individual lists are defined:


1.Key stage.

2.Low pass.

In the dominant step, the magnitude of the wavelet coefficients is compared to an arbitrary threshold value; the essential data is determined, and the coefficients

are defined with an absolute value. The scanning was done with spatial frequencies; two bits are used to define the sign and the position of the significant coefficients. The positive significant coefficient and the non-significant coefficients are above an arbitrary threshold, usually starting with the two highest powers, two below the maximum wavelet value, where the wavelet coefficient is insignificant and has a significant descendant.

STEP 3: Dominant pass (significant pass):

Let wavelet coefficients in the dominant list are compared with Ti to determine the importance and, if significant, its sign. The resulting significance map is coded and sent by a zero tree. The inclusion of the ZTR symbol increases the coding efficiency because the encoder maximizes the correlation between image scales.

STEP 4: Four symbols are used to form a code:

1.Zerotree root.

2.Zero isolated.

3.Significant positive.

4.Significant negative

The EZW technique can be significantly improved using entropy coding as a preoccupation to achieve better compression [22].

STEP 5: Define the entropy code.

The low pass filter follows, where significant coefficients are detected and refined under the successive approach quantification (SAQ ) approach [21].

STEP 6: Define refinement pass.

STEP 7: The entropy code sequence of 1 and 0 is defined, and adaptive AC is used, and send STOP.

#### **4. Proposed algorithm**

The development of proposed EZW (embedded zerotree wavelet) image coding has attracted considerable attention among researchers [23–25]. It is the most popular wavelet-based compression algorithm and is widely used in several imagebased applications. In this paper is used the recursive transformation method for multi-level decomposition, where the result data is then preprocessed before of the zerotree compression, the block diagram of a wavelet-based image coding algorithm is shown in **Figure 3**.

Our objective of this paper is to show our proposed algorithm to enhance the compression of an image eviting to minimal loss during reconstruction.

Our algorithm was applied as a preprocessing stage; this allows to eliminate unused data in the transformed image that is not important and significant in the reconstruction of the image.

However, it is necessary to mention that more bits are required during compression and the processing time increases so that a parallel proposal can help.

When compensation is used to a greater extent between the compression ratio and reconstruction in image quality, it is possible to eliminate irrelevant data in the image where higher compression is achieved with a slight reduction in quality [21].

In an image transformed by wavelet, the coefficients represent a low-resolution image in the LL subband. The high-frequency subbands contain subbands of specific data in each direction.

**165**

*Many-Core Algorithm of the Embedded Zerotree Wavelet Encoder*

The use of the three subbands contributes to a smaller scale within the image reconstruction process because the coefficients are mostly zero, and few large values correspond to border information and textures in the image [22].

In this article, an algorithm was proposed to reduce less essential data in order to achieve higher compression, and thus preserve high-value coefficients, thereby

Our proposed algorithm uses the weight calculation method to have the minimum value subband for each level. The weight of each subband is calculated by adding the absolute value in all subband coefficients. Only three subbands (LHi,

*Ws* = ∑*abs*(*coefficients in subbands*)( min (*subbandi*)

After finding the required subbands in each level, the algorithm reduces the depreciable data in these subbands depending on the importance of the data for preserver the reconstruction. In this age, the majority of the values are close to zero, and the coefficients have the smallest data in each subband, where it is used as a

The coefficients whose value is higher than a set threshold value are retained, and the value is near to zero are deleted. In our experiments, it was used different two threshold values to show the effect of compressed output and reconstructed image. In the zerotree coding, the reduction of low valued significant coefficients in minimum weight subbands, result in higher compression ratio with a slight loss in decoded PSNR. Our results show that this algorithm shows better efficiency with a

In the Epiphany, III System is used a memory model and also a programming model, in a flow system that includes single instruction multiple data (SIMD), single program multiple data (SPMD), master–slave programming, multiple

Can also be programmed, to data flow, static, dynamic, systolic array, multithreads, message passing and sequential communication processes (SCP) [26]. First, the operating system performs a review of the hardware of the Epiphany system and then begins to configure the cores that are in the mesh-type topology and then the distributed information of the matrices A, B. This is due to the structural nature of the epiphany system, and finally the distribution of tasks in small

= *Subband with minimum weight*,*at the level* I (1)

eliminating the lowest values of the minimum value subband.

HLi and HHi) with a minimum weight are seen in detail.

threshold to eliminate low-valued data in that subband.

cost of negligible loss in picture quality.

**5. Parallel implementation**

instruction multiple data (MIMD).

blocks that is appropriate (**Figure 4**).

Eq. (1) shows it.

**Figure 3.**

*DOI: http://dx.doi.org/10.5772/intechopen.89300*

*Block diagram of a wavelet-based image coding algorithm.*

*Many-Core Algorithm of the Embedded Zerotree Wavelet Encoder DOI: http://dx.doi.org/10.5772/intechopen.89300*

#### **Figure 3.**

*Coding Theory*

significant descendant.

1.Zerotree root.

2.Zero isolated.

used, and send STOP.

is shown in **Figure 3**.

reconstruction of the image.

specific data in each direction.

**4. Proposed algorithm**

3.Significant positive.

4.Significant negative

STEP 3: Dominant pass (significant pass):

STEP 4: Four symbols are used to form a code:

preoccupation to achieve better compression [22].

STEP 5: Define the entropy code.

STEP 6: Define refinement pass.

are defined with an absolute value. The scanning was done with spatial frequencies; two bits are used to define the sign and the position of the significant coefficients. The positive significant coefficient and the non-significant coefficients are above an arbitrary threshold, usually starting with the two highest powers, two below the maximum wavelet value, where the wavelet coefficient is insignificant and has a

Let wavelet coefficients in the dominant list are compared with Ti to determine the importance and, if significant, its sign. The resulting significance map is coded and sent by a zero tree. The inclusion of the ZTR symbol increases the coding efficiency because the encoder maximizes the correlation between image scales.

The EZW technique can be significantly improved using entropy coding as a

The low pass filter follows, where significant coefficients are detected and refined under the successive approach quantification (SAQ ) approach [21].

STEP 7: The entropy code sequence of 1 and 0 is defined, and adaptive AC is

The development of proposed EZW (embedded zerotree wavelet) image coding

Our objective of this paper is to show our proposed algorithm to enhance the

Our algorithm was applied as a preprocessing stage; this allows to eliminate unused data in the transformed image that is not important and significant in the

However, it is necessary to mention that more bits are required during compres-

When compensation is used to a greater extent between the compression ratio and reconstruction in image quality, it is possible to eliminate irrelevant data in the image where higher compression is achieved with a slight reduction in quality [21]. In an image transformed by wavelet, the coefficients represent a low-resolution

compression of an image eviting to minimal loss during reconstruction.

sion and the processing time increases so that a parallel proposal can help.

image in the LL subband. The high-frequency subbands contain subbands of

has attracted considerable attention among researchers [23–25]. It is the most popular wavelet-based compression algorithm and is widely used in several imagebased applications. In this paper is used the recursive transformation method for multi-level decomposition, where the result data is then preprocessed before of the zerotree compression, the block diagram of a wavelet-based image coding algorithm

**164**

*Block diagram of a wavelet-based image coding algorithm.*

The use of the three subbands contributes to a smaller scale within the image reconstruction process because the coefficients are mostly zero, and few large values correspond to border information and textures in the image [22].

In this article, an algorithm was proposed to reduce less essential data in order to achieve higher compression, and thus preserve high-value coefficients, thereby eliminating the lowest values of the minimum value subband.

Our proposed algorithm uses the weight calculation method to have the minimum value subband for each level. The weight of each subband is calculated by adding the absolute value in all subband coefficients. Only three subbands (LHi, HLi and HHi) with a minimum weight are seen in detail.

Eq. (1) shows it.

$$\begin{array}{l} \text{Ws} = \sum \text{abs} \{ \text{coefficients in subbands} \} \{ \text{min (subb andi)} \\ = \text{Subband with minimum weight}, \text{at the level I} \end{array} \tag{1}$$

After finding the required subbands in each level, the algorithm reduces the depreciable data in these subbands depending on the importance of the data for preserver the reconstruction. In this age, the majority of the values are close to zero, and the coefficients have the smallest data in each subband, where it is used as a threshold to eliminate low-valued data in that subband.

The coefficients whose value is higher than a set threshold value are retained, and the value is near to zero are deleted. In our experiments, it was used different two threshold values to show the effect of compressed output and reconstructed image.

In the zerotree coding, the reduction of low valued significant coefficients in minimum weight subbands, result in higher compression ratio with a slight loss in decoded PSNR. Our results show that this algorithm shows better efficiency with a cost of negligible loss in picture quality.

#### **5. Parallel implementation**

In the Epiphany, III System is used a memory model and also a programming model, in a flow system that includes single instruction multiple data (SIMD), single program multiple data (SPMD), master–slave programming, multiple instruction multiple data (MIMD).

Can also be programmed, to data flow, static, dynamic, systolic array, multithreads, message passing and sequential communication processes (SCP) [26].

First, the operating system performs a review of the hardware of the Epiphany system and then begins to configure the cores that are in the mesh-type topology and then the distributed information of the matrices A, B. This is due to the structural nature of the epiphany system, and finally the distribution of tasks in small blocks that is appropriate (**Figure 4**).

**Figure 5.** *Execution SPMD.*

For this distribution, the programming model single program multiple data (SPMD) is used [27], which is responsible for distributing the execution for each of the cores, as shown in **Figure 5**.

### **6. Distributions of data and tasks**

Depending on the problem. It can choose between two approaches to decompositions: data decomposition or task decomposition.

Establish strategies to choose decompositions: decomposition of instructions or decomposition of tasks.

Decomposition of instructions: this refers to the distribution of instructions for the cores, which will handle the processing. Then, it describes the process of distributing the instructions that can execute in parallel on different cores.

In which the results of each one of the executions of each nucleus, of this type of executions in parallel accelerates n times faster if a single nucleus were executed. Except for the minimum delay involved in the initial distribution of workload or instructions, and the final collection of data results, resulting in linear acceleration with the number of cores.

Decomposition of tasks: this refers to the distribution of tasks for each of the cores, which will be responsible for processing tasks. That is to say that the whole program is devoted to tasks. A task is a sequence of the program that can be executed in parallel, concurrent with other tasks. This approach is beneficial when tasks maintain high levels of independence.

**167**

**Figure 6.**

*Data flow in matrix multiplication.*

*Many-Core Algorithm of the Embedded Zerotree Wavelet Encoder*

If each task and algorithm are different, then both can be implemented, the functional decomposition, in which the particular characteristics of each type of task or algorithms will be used to execute them in the nucleus for that purpose. Recalling that the new technologies of embedded systems have integrated circuits with several cores, in which the cores perform threads that are the tasks that are executed in parallel, so the applications will be made in less time. This allows increasing the parallelism and the concurrence in hardware and software. It is also important to make the distribution between cores, the tasks, as well as the synchro-

Matrices are mathematical operations that scientists use. In which, the algorithm used for matrix multiplications that run in parallel is described by IBM [28]. The importance of data communication between neighboring cores according to the Cannon algorithm [29] is also mentioned. The memory in each core represents a challenge in the implementation because it is limited, which makes it necessary to use the available memory space for communication between the cores. These are

**Figure 6** shows the multiplications of matrices for each core. It also shows the sending of data for the execution of the tasks, between each core, so the Epiphany III system incorporates the mechanism, message passing, to synchronize the sending of the threads avoiding conflicts between the cores or accesses to the shared

**Figure 6** shows the data flow for matrix multiplication that runs in specific numbers of steps, this is determined by the quadratic root of √P. Then P is the number of cores, in which the matrix multiplication is the data set of size √P × √P. At each repetition, element C of the product matrix obtained, then matrix A moves down and matrix B moves to the right. This example can be programmed using the standard high-level ANSI programming language "C." The Epiphany III system provides specific functions to simplify the programming of many cores, thanks to the open operating system, but its use is not mandatory for programmers. The implementation of the algorithm, in the Epiphany III system with 16 cores, operating at 1 GHz, which solves matrix multiplication of 128 × 128 in 2 ms. The performance of the Epiphany III system grows linearly, using appropriate programming and data distribution models, and this is seen in the cost/performance of an

To exploit the parallelism, in the hardware platform epiphany, it is encouraged that the application is broken down into tasks (code portions). Thus, each of the

some important factors for the system dedicated to parallel processing.

*DOI: http://dx.doi.org/10.5772/intechopen.89300*

nization between the cores.

global memory.

optimized system [11, 30–33].

cores could run a program part in parallel to other cores.

#### *Many-Core Algorithm of the Embedded Zerotree Wavelet Encoder DOI: http://dx.doi.org/10.5772/intechopen.89300*

*Coding Theory*

**Figure 4.**

*Sequential execution.*

For this distribution, the programming model single program multiple data (SPMD) is used [27], which is responsible for distributing the execution for each of

Depending on the problem. It can choose between two approaches to decompo-

Establish strategies to choose decompositions: decomposition of instructions or

Decomposition of instructions: this refers to the distribution of instructions for the cores, which will handle the processing. Then, it describes the process of distributing the instructions that can execute in parallel on different cores.

In which the results of each one of the executions of each nucleus, of this type of executions in parallel accelerates n times faster if a single nucleus were executed. Except for the minimum delay involved in the initial distribution of workload or instructions, and the final collection of data results, resulting in linear acceleration

Decomposition of tasks: this refers to the distribution of tasks for each of the cores, which will be responsible for processing tasks. That is to say that the whole program is devoted to tasks. A task is a sequence of the program that can be executed in parallel, concurrent with other tasks. This approach is beneficial when

the cores, as shown in **Figure 5**.

**Figure 5.** *Execution SPMD.*

decomposition of tasks.

with the number of cores.

tasks maintain high levels of independence.

**6. Distributions of data and tasks**

sitions: data decomposition or task decomposition.

**166**

If each task and algorithm are different, then both can be implemented, the functional decomposition, in which the particular characteristics of each type of task or algorithms will be used to execute them in the nucleus for that purpose.

Recalling that the new technologies of embedded systems have integrated circuits with several cores, in which the cores perform threads that are the tasks that are executed in parallel, so the applications will be made in less time. This allows increasing the parallelism and the concurrence in hardware and software. It is also important to make the distribution between cores, the tasks, as well as the synchronization between the cores.

Matrices are mathematical operations that scientists use. In which, the algorithm used for matrix multiplications that run in parallel is described by IBM [28]. The importance of data communication between neighboring cores according to the Cannon algorithm [29] is also mentioned. The memory in each core represents a challenge in the implementation because it is limited, which makes it necessary to use the available memory space for communication between the cores. These are some important factors for the system dedicated to parallel processing.

**Figure 6** shows the multiplications of matrices for each core. It also shows the sending of data for the execution of the tasks, between each core, so the Epiphany III system incorporates the mechanism, message passing, to synchronize the sending of the threads avoiding conflicts between the cores or accesses to the shared global memory.

**Figure 6** shows the data flow for matrix multiplication that runs in specific numbers of steps, this is determined by the quadratic root of √P. Then P is the number of cores, in which the matrix multiplication is the data set of size √P × √P.

At each repetition, element C of the product matrix obtained, then matrix A moves down and matrix B moves to the right. This example can be programmed using the standard high-level ANSI programming language "C." The Epiphany III system provides specific functions to simplify the programming of many cores, thanks to the open operating system, but its use is not mandatory for programmers.

The implementation of the algorithm, in the Epiphany III system with 16 cores, operating at 1 GHz, which solves matrix multiplication of 128 × 128 in 2 ms. The performance of the Epiphany III system grows linearly, using appropriate programming and data distribution models, and this is seen in the cost/performance of an optimized system [11, 30–33].

To exploit the parallelism, in the hardware platform epiphany, it is encouraged that the application is broken down into tasks (code portions). Thus, each of the cores could run a program part in parallel to other cores.

**Figure 6.** *Data flow in matrix multiplication.*

The decomposition of tasks must be followed by a synchronization of the different parties involved to ensure data consistency.

In **Figure 6**, it represents the matrix mathematically. It also shows the simplification of matrix multiplication. This multiplication is represented by (2).

$$\mathbf{C}\_{\rm ij} = \sum\_{\mathbf{k}=\mathbf{0}}^{\rm N-1} \{ \mathbf{A}\_{\rm ij} \, \mathbf{B}\_{\rm kj} \} \tag{2}$$

The input matrices are A and B, C is the resulting matrix of Ai and Bj , in which they have as coordinate (i, j), which is (row, column), these are elements of the matrix. The procedure to program matrix multiplication in a single core is shown below.


The previous code used the standard language C, which compiles and executes in a single core.

If these matrices A, B, will proceed to run on each of the cores in the system and the result of each element of the matrix C. are placed in the local memory of each of the cores.

#### **7. Experiment results**

Different image outputs of the wavelet-based compression were shown in **Figure 7**.

**Table 1** shows the values when was applied our proposed, **Table 2** shows the obtained values using adaptive arithmetic coding.


**Table 1.**

*Obtained values.*


**169**

**8. Conclusion**

**Figure 7.** *Output images.*

**Acknowledgements**

(EDI) and the Program of Stimulus COFAA.

The above method exploits the property of tradeoff between compression ratio and output PSNR and reduces the least essential data in order to attain further compression. The better compression ratio is achieved compared to original EZW coder after applying threshold with a slight reduction in PSNR during reconstruction.

We appreciate the facilities granted to carry out this work to the Instituto Politécnico Nacional through the Secretariat of Research and Postgraduate with the SIP 20194986 and 20195024 projects. To the Interdisciplinary Unit of Engineering and Social and Administrative Sciences, Center for Technological Innovation and Development in Computing and Digital Technologies Research and Development Center. Likewise, to the Program of Stimulus to the Performance of the Researchers

*Many-Core Algorithm of the Embedded Zerotree Wavelet Encoder*

*DOI: http://dx.doi.org/10.5772/intechopen.89300*

#### **Table 2.**

*Obtained values using the adaptive arithmetic coding.*

*Many-Core Algorithm of the Embedded Zerotree Wavelet Encoder DOI: http://dx.doi.org/10.5772/intechopen.89300*

*Coding Theory*

below.

in a single core.

**7. Experiment results**

obtained values using adaptive arithmetic coding.

*Obtained values using the adaptive arithmetic coding.*

the cores.

**Figure 7**.

**Table 1.** *Obtained values.*

ent parties involved to ensure data consistency.

The decomposition of tasks must be followed by a synchronization of the differ-

In **Figure 6**, it represents the matrix mathematically. It also shows the simplification of matrix multiplication. This multiplication is represented by (2).

(Aij Bkj) (2)

, in which

**Proposed method threshold = 5**

**Proposed method threshold = 5**

 Cij = ∑ k=0 N−1

The input matrices are A and B, C is the resulting matrix of Ai and Bj

they have as coordinate (i, j), which is (row, column), these are elements of the matrix. The procedure to program matrix multiplication in a single core is shown

The previous code used the standard language C, which compiles and executes

If these matrices A, B, will proceed to run on each of the cores in the system and the result of each element of the matrix C. are placed in the local memory of each of

Different image outputs of the wavelet-based compression were shown in

**Image Min weight subband EZW method Proposed method** 

**Image Min weight subband EZW method Proposed method** 

**Table 1** shows the values when was applied our proposed, **Table 2** shows the

**L3 L2 L1 Bytes PSNR Bytes PSNR Bytes PSNR**

**L3 L2 L1 Bytes PSNR Bytes PSNR Bytes PSNR**

Lena HH HH HH 3024 22.98 dB 2952 25.57 dB 2830 24.96 dB

Lena HH HH HH 63,456 22.98 dB 62,256 25.57 dB 61,465 24.96 dB

**threshold = 2**

**threshold = 2**

**168**

**Table 2.**

### **8. Conclusion**

The above method exploits the property of tradeoff between compression ratio and output PSNR and reduces the least essential data in order to attain further compression. The better compression ratio is achieved compared to original EZW coder after applying threshold with a slight reduction in PSNR during reconstruction.

#### **Acknowledgements**

We appreciate the facilities granted to carry out this work to the Instituto Politécnico Nacional through the Secretariat of Research and Postgraduate with the SIP 20194986 and 20195024 projects. To the Interdisciplinary Unit of Engineering and Social and Administrative Sciences, Center for Technological Innovation and Development in Computing and Digital Technologies Research and Development Center. Likewise, to the Program of Stimulus to the Performance of the Researchers (EDI) and the Program of Stimulus COFAA.

*Coding Theory*

### **Author details**

Jesús Antonio Alvarez-Cedillo1 \*, Teodoro Alvarez-Sanchez2 , Mario Aguilar-Fernandez1 and Jacobo Sandoval-Gutierrez<sup>3</sup>

1 Instituto Politécnico Nacional, UPIICSA, Iztacalco, Granjas México, Mexico


\*Address all correspondence to: jaalvarez@ipn.mx

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

*Many-Core Algorithm of the Embedded Zerotree Wavelet Encoder DOI: http://dx.doi.org/10.5772/intechopen.89300*

#### **References**

*Coding Theory*

**170**

Mexico

**Author details**

Jesús Antonio Alvarez-Cedillo1

Mario Aguilar-Fernandez1

Ciudad de México, Mexico

\*, Teodoro Alvarez-Sanchez<sup>2</sup>

and Jacobo Sandoval-Gutierrez<sup>3</sup>

2 Instituto Politécnico Nacional, CITEDI, Mesa de Otay, Tijuana, Baja California,

3 El Panteón Lerma de Villada, Municipio de Lerma, Estado de México, Mexico

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium,

1 Instituto Politécnico Nacional, UPIICSA, Iztacalco, Granjas México,

\*Address all correspondence to: jaalvarez@ipn.mx

provided the original work is properly cited.

,

[1] Yamamoto A. Wavelet analysis: Theory and applications. Transform. 1994. DOI: 10.1051/jp1:1997114

[2] Shensa MJ. The discrete wavelet transform: Wedding the À Trous and Mallat algorithms. IEEE Transactions on Signal Processing. 1992. DOI: 10.1109/78.157290

[3] Tzanetakis G, Essl G, Cook P-R. Audio analysis using the discrete wavelet transform. In: Proceedings of the WSES International Conference Acoustics and Music: Theory and Applications (AMTA 2001). 2001

[4] Filters W, Transforms W. Preview of Wavelets, Wavelet Filters, and Wavelet Transforms. Space & Signals Technologies LLC; 2009

[5] Misiti M, Misiti Y, Oppenheim G, Poggi JM. Wavelets and Their Applications. 2010. DOI: 10.1002/9780470612491

[6] Letelier JC, Weber PP. Spike sorting based on discrete wavelet transform coefficients. Journal of Neuroscience Methods. 2000. DOI: 10.1016/ S0165-0270(00)00250-8

[7] Nason GP, Silverman BW. The discrete wavelet transform in S. Journal of Computational and Graphical Statistics. 1994. DOI: 10.1080/10618600.1994.10474637

[8] Goodman RW. Discrete wavelet transforms. In: Discrete Fourier and Wavelet Transforms. 2016. DOI: 10.1142/9789814725781\_0003

[9] Tzanetakis G, Essl G, Cook P-R. Audio analysis using the discrete wavelet transform. In: Proceedings of the WSES International Conference Acoustics and Music: Theory and Applications (AMTA 2001). 2001

[10] Mitra SK. Digital signal processing—Computer-based approach. Microelectronics Journal. 2001. Sanjit K. Mitra.pdf. DOI: 10.1016/ S0026-2692(98)00072-X

[11] Alvarez-Sanchez T, Alvarez-Cedillo JA, Sandoval-Gutierrez J. Many-core parallel algorithm to correct the gaussian noise of an image. In: Communications in Computer and Information Science. Vol. 948. 2019. DOI: 10.1007/978-3-030-10448-1\_7

[12] Vajda A. Programming Many-Core Chips. 2011. DOI: 10.1007/978-1-4419-9739-5

[13] Boyd-Wickizer S, Clements A, Mao Y, Pesterev A, Kaashoek M, Morris R, et al. An Analysis of Linux Scalability to Many Cores. MIT Web Domain; 2010

[14] Hibbs AE, Thompson KG, French D, Wrigley A, Spears I. Optimizing performance by improving core stability and core strength. Sports Medicine. 2008. DOI: 10.2165/00007256-200838120-00004

[15] Manferdelli JL, Govindaraju NK, Crall C. Challenges and opportunities in many-core computing. Proceedings of the IEEE. 2008. DOI: 10.1109/ JPROC.2008.917730

[16] Bates D. Introduction to the Matrix Package. R Core Development Group; 2012. DOI: 10.1016/j. amjmed.2014.09.011

[17] Guz Z, Bolotin E, Keidar I, Kolodny A, Mendelson A, Weiser UC. Many-core vs. many-thread machines: Stay away from the valley. IEEE Computer Architecture Letters. 2009. DOI: 10.1109/L-CA.2009.4

[18] Shapiro JM. Embedded image coding using Zerotrees of wavelet coefficients. IEEE Transactions on Signal Processing. 1993. DOI: 10.1109/78.258085

[19] Said A, Pearlman WA. A new, fast, and efficient image codec based on set partitioning in hierarchical trees. IEEE Transactions on Circuits and Systems for Video Technology. 1996. DOI: 10.1109/76.499834

[20] Hong ES, Ladner RE. Group testing for image compression. IEEE Transactions on Image Processing. 2002. DOI: 10.1109/TIP.2002.801124

[21] Amit Y, Geman D. Shape quantization and recognition with randomized trees. Neural Computation. 1997. DOI: 10.1162/neco.1997.9.7.1545

[22] Blanco-Velasco M, Cruz-Roldán F, Moreno-Martínez E, Godino-Llorente JI, Barner KE. Embedded filter bankbased algorithm for ECG compression. Signal Processing. 2008. DOI: 10.1016/j. sigpro.2007.12.006

[23] George R, Manimekalai MAP. A novel approach for image compression using zero tree coding. In: 2014 International Conference on Electronics and Communication Systems, ICECS 2014. 2014. DOI: 10.1109/ ECS.2014.6892611

[24] Zhou J, Huang PS, Chiang F-P. Wavelet-based pavement distress image compression and noise reduction. Wavelets XI. 2005. DOI: 10.1117/12.612926

[25] Shen K, Delp EJ. Wavelet based rate scalable video compression. IEEE Transactions on Circuits and Systems for Video Technology. 1999. DOI: 10.1109/76.744279

[26] Duncan R. A survey of parallel computer architectures. Computer. 1990. DOI: 10.1109/2.44900

[27] Omiecinski E. Highly parallel computing. Information and

Software Technology. 2003. DOI: 10.1016/0950-5849(90)90035-p

Chapter 11

Abstract

1. Introduction

173

Ali Akbari and Maria Trocan

coding standards, such as JPEG and JPEG2000.

On the Application of Dictionary

Signal models are a cornerstone of contemporary signal and image-processing methodology. In this chapter, a particular signal modelling method, called synthesis sparse representation, is studied which has been proven to be effective for many signals, such as natural images, and successfully used in a wide range of applications. In this kind of signal modelling, the signal is represented with respect to dictionary. The dictionary choice plays an important role on the success of the entire model. One main discipline of dictionary designing is based on a machine learning methodology which provides a simple and expressive structure for designing adaptable and efficient dictionaries. This chapter focuses on direct application of the sparse representation, i.e. image compression. Two image codec based on adaptive sparse representation over a trained dictionary are introduced. Experimental results show that the presented methods outperform the existing image

Keywords: image compression, dictionary learning, sparse representation

for the success of entire model in an efficient representation of the signal.

Finding appropriate dictionaries with good predictive power of various signal classes of interest and high compactness ability, especially natural images, has been an active field of research during past decades. The early attempts for designing dictionaries were based on building the model using harmonic analysis of the signal classes and extracting some mathematical functions, resulting in a fixed off-the-shelf dictionary called analytic or mathematical dictionary. The sparse

Signal models are fundamental tools for efficiently processing of the signals of interesting, including audio recordings, natural images, video clips, and medical scans, to name just a few. A signal model formulates a mathematical description of the family of signals of interesting in a way which faithfully captures their behaviour. Designing accurate signal models, which efficiently capture useful characteristics of the signals, has been a crucial aim in the signal processing area for so many years and a variety of mathematical forms has been proposed. Sparsity-based modelling has been used in many applications in which each signal is represented in terms of linear combinations of an underlying set, called dictionary, of elementary signals known as atoms, resulting in simple and compact models. The driving force behind this model is sparsity, i.e. the rapid decay of the representation coefficients over the dictionary. In this signal modelling, the dictionary plays an important role

Learning to Image Compression

[28] Lord N, Golub GH, Van Loan CF. Matrix computations. The Mathematical Gazette. 2007. DOI: 10.2307/3621013

[29] Williams S, Oliker L, Vuduc R, Shalf J, Yelick K, Demmel J. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Parallel Computing. 2009. DOI: 10.1016/j.parco. 2008.12.006

[30] Haibo SB, Rong C, Yandong C, Kaashoek F, Morris R, Pesterev A, et al. Corey: An Operating System for Many Cores. OSDI '08; 2008

[31] Guz Z, Bolotin E, Keidar I, Kolodny A, Mendelson A, Weiser UC. Many-core vs. many-thread machines: Stay away from the valley. IEEE Computer Architecture Letters. 2009. DOI: 10.1109/L-CA.2009.4

[32] Diaz J, Muñoz-Caro C, Niño A. A survey of parallel programming models and tools in the multi and many-core era. IEEE Transactions on Parallel and Distributed Systems. 2012. DOI: 10.1109/TPDS.2011.308

[33] Hill MD, Marty MR. Amdahl's law in the multicore era. Computer. 2008. DOI: 10.1109/MC.2008.209

#### Chapter 11

*Coding Theory*

10.1109/78.258085

10.1109/76.499834

sigpro.2007.12.006

ECS.2014.6892611

10.1117/12.612926

10.1109/76.744279

coefficients. IEEE Transactions on Signal Processing. 1993. DOI:

[20] Hong ES, Ladner RE. Group testing for image compression. IEEE Transactions on Image Processing. 2002.

DOI: 10.1109/TIP.2002.801124

[21] Amit Y, Geman D. Shape quantization and recognition with randomized trees. Neural Computation. 1997. DOI: 10.1162/neco.1997.9.7.1545

[22] Blanco-Velasco M, Cruz-Roldán F, Moreno-Martínez E, Godino-Llorente JI, Barner KE. Embedded filter bankbased algorithm for ECG compression. Signal Processing. 2008. DOI: 10.1016/j.

[23] George R, Manimekalai MAP. A novel approach for image compression using zero tree coding. In: 2014

and Communication Systems, ICECS 2014. 2014. DOI: 10.1109/

[24] Zhou J, Huang PS, Chiang F-P. Wavelet-based pavement

distress image compression and noise reduction. Wavelets XI. 2005. DOI:

[25] Shen K, Delp EJ. Wavelet based rate scalable video compression. IEEE Transactions on Circuits and Systems for Video Technology. 1999. DOI:

[26] Duncan R. A survey of parallel computer architectures. Computer.

[27] Omiecinski E. Highly parallel computing. Information and

1990. DOI: 10.1109/2.44900

International Conference on Electronics

[19] Said A, Pearlman WA. A new, fast, and efficient image codec based on set partitioning in hierarchical trees. IEEE Transactions on Circuits and Systems for Video Technology. 1996. DOI:

Software Technology. 2003. DOI: 10.1016/0950-5849(90)90035-p

Loan CF. Matrix computations. The Mathematical Gazette. 2007. DOI:

[29] Williams S, Oliker L, Vuduc R, Shalf J, Yelick K, Demmel J. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Parallel Computing. 2009. DOI: 10.1016/j.parco.

[30] Haibo SB, Rong C, Yandong C, Kaashoek F, Morris R, Pesterev A, et al. Corey: An Operating System for Many

[31] Guz Z, Bolotin E, Keidar I, Kolodny A, Mendelson A, Weiser UC. Many-core vs. many-thread machines: Stay away from the valley. IEEE Computer Architecture Letters. 2009.

DOI: 10.1109/L-CA.2009.4

10.1109/TPDS.2011.308

10.1109/MC.2008.209

[32] Diaz J, Muñoz-Caro C, Niño A. A survey of parallel programming models and tools in the multi and many-core era. IEEE Transactions on Parallel and Distributed Systems. 2012. DOI:

[33] Hill MD, Marty MR. Amdahl's law in the multicore era. Computer. 2008. DOI:

[28] Lord N, Golub GH, Van

10.2307/3621013

2008.12.006

Cores. OSDI '08; 2008

**172**

## On the Application of Dictionary Learning to Image Compression

Ali Akbari and Maria Trocan

#### Abstract

Signal models are a cornerstone of contemporary signal and image-processing methodology. In this chapter, a particular signal modelling method, called synthesis sparse representation, is studied which has been proven to be effective for many signals, such as natural images, and successfully used in a wide range of applications. In this kind of signal modelling, the signal is represented with respect to dictionary. The dictionary choice plays an important role on the success of the entire model. One main discipline of dictionary designing is based on a machine learning methodology which provides a simple and expressive structure for designing adaptable and efficient dictionaries. This chapter focuses on direct application of the sparse representation, i.e. image compression. Two image codec based on adaptive sparse representation over a trained dictionary are introduced. Experimental results show that the presented methods outperform the existing image coding standards, such as JPEG and JPEG2000.

Keywords: image compression, dictionary learning, sparse representation

#### 1. Introduction

Signal models are fundamental tools for efficiently processing of the signals of interesting, including audio recordings, natural images, video clips, and medical scans, to name just a few. A signal model formulates a mathematical description of the family of signals of interesting in a way which faithfully captures their behaviour. Designing accurate signal models, which efficiently capture useful characteristics of the signals, has been a crucial aim in the signal processing area for so many years and a variety of mathematical forms has been proposed. Sparsity-based modelling has been used in many applications in which each signal is represented in terms of linear combinations of an underlying set, called dictionary, of elementary signals known as atoms, resulting in simple and compact models. The driving force behind this model is sparsity, i.e. the rapid decay of the representation coefficients over the dictionary. In this signal modelling, the dictionary plays an important role for the success of entire model in an efficient representation of the signal.

Finding appropriate dictionaries with good predictive power of various signal classes of interest and high compactness ability, especially natural images, has been an active field of research during past decades. The early attempts for designing dictionaries were based on building the model using harmonic analysis of the signal classes and extracting some mathematical functions, resulting in a fixed off-the-shelf dictionary called analytic or mathematical dictionary. The sparse

representation using these fixed mathematical dictionaries is called analysis sparse modelling. The long series of works on designing the analytic dictionaries lead to appearing various transforms such as Fourier and its discrete version, discrete cosine, wavelets, curvelets, contourlets, bandlets, and steerable wavelets.

works which sparsify uniformly the smooth signals. However, this signal modelling loses its optimality in representation of the image signals due to existence of curve singularities (elongated edges) in these types of signals [27]. As an instance, the images encoded by the JPEG2000 standard suffer from the ringing (smoothing)

In an attempt to minimize this weakness of the dictionaries, the design of more general over-complete dictionaries, which have more atoms than the dimension of signal, i.e. K>N, has been investigated over the past decades, and is still intensely ongoing. These dictionaries have a more descriptive ability to represent a wide range of interesting signals, in comparison with the invertible complete dictionaries.

Compared to the complete case, representation with over-complete dictionaries must be more carefully defined. There are two distinct paths for representing a signal using the over-complete dictionaries: analysis path and synthesis path. The analysis sparse modelling relies on the classical basics of the signal modelling in which the representation of the signal is identified as a linear combination of atoms,

where the coefficients vector c<sup>a</sup> is obtained via the inner products of the signal and the dictionary <sup>c</sup><sup>a</sup> <sup>¼</sup> <sup>Ω</sup><sup>x</sup> <sup>¼</sup> <sup>D</sup><sup>T</sup>x. This method has the advantage of providing a simple and efficient way to achieve sparse representation over the dictionary. In this case, every signal has an unique representation as a linear combination of the

Increasing sparsity, in order to obtain a well-defined representation, requires departure from this linear representation towards a more flexible and non-linear representation. Each signal is represented using a different set of atoms from a pool, called dictionary, in order to achieve the best sparsity. Thus, the approximation

where the coefficients vector c<sup>s</sup> is obtained via a non-linear approach, in contrast to the linear-based approach in the analysis path. This signal modelling approach, called synthesis sparse representation, needs further refinement to find the welldefined representation due to degrees of freedom identified by the null-space of D [27], which leads to a non-unique choice of c<sup>s</sup> in Eq. (4), as opposed to the analysis sparse modelling which has a unique solution. In order to find the most informative representation, the coefficients vector c<sup>s</sup> is obtained with respect to some cost function Fð Þ� , which minimizes the sparsity of the coefficients vector c<sup>s</sup> under a

F cð Þ subject to k k <sup>x</sup> � Dc <sup>2</sup>

where ϵ is the prior knowledge about the noise level. The penalty function Fð Þ� is defined in a way that is tolerant to the large coefficients and aggressively penalizes the small coefficients [27]. The normal choice for this function is ℓ<sup>p</sup> norm with 0≤ p ≤ 1. Of specific interest is the ℓ<sup>0</sup> case, i.e., F cð Þ¼ ∥c∥0, which counts the number of non-zeros in the representation. For this case, the problem Eq. (5)

x ≃ Dca, (3)

x ≃ Dcs, (4)

<sup>2</sup> ≤ ϵ, (5)

2.1 Sparse modelling using over-complete dictionaries

On the Application of Dictionary Learning to Image Compression

DOI: http://dx.doi.org/10.5772/intechopen.89114

artefacts near edges.

dictionary atoms.

process becomes

reconstruction constraint:

becomes

175

c<sup>s</sup> ¼ arg min c∈ <sup>K</sup>

A significantly different approach to the sparse modelling, originally introduced by Olshausen and Field [1], consists of learning a dictionary from some training data. The sparse representation using this trained dictionary is called, synthesis sparse modelling. The trained dictionaries, also called synthetic dictionary, could efficiently capture the underlying structures in the natural image patches and are well adapted to a large class of natural signals.

The analysis and synthesis sparse signal modelling has led to the design effective algorithms for many image-processing applications, such as compression [2–8] and solving inverse problems [9–22]. A straightforward application of the sparse signal modelling in the field of image processing has been image compression due to providing a compact representation of the signal. This chapter presents the theoretical and practical aspects of using the dictionary-based sparse signal modelling for image compression. We address one important question: How to efficiently represent an image over a trained dictionary in order to improve the performance of the image compression? Based on this concept, we present two novel image compression methods. Experimental results show that the introduced methods outperform the existing image coding standards, such as JPEG and JPEG2000.

The remainder of chapter is organized as follows: It starts with introducing the concept of sparsity in signal processing. The use of sparse representation modelling with respect to trained dictionaries is presented. Next, a brief description of a wellknown and yet effective dictionary learning algorithm, introduced by Aharon et al. [23], is given. The end of chapter is devoted to introducing two generic image compression schemes. Finally, we conclude the chapter.

#### 2. Sparsity-based signal models

One of the well-known methods in designing of the signal models is linear approximation. In this modelling technique, given a set of vectors <sup>d</sup><sup>k</sup> <sup>∈</sup> <sup>N</sup> � �<sup>K</sup>�<sup>1</sup> <sup>k</sup>¼<sup>0</sup> , a signal x∈ <sup>N</sup> is represented as a linear combination of K basis,

$$\mathbf{x} \simeq \sum\_{k=0}^{K-1} c\_k \mathbf{d}\_k,\tag{1}$$

where set ½ � ck K�1 <sup>k</sup>¼<sup>0</sup> consists of representation coefficients. The signal approximation Eq. (1) can be reformulated in a matrix form as

$$\mathbf{x} \simeq \mathbf{D} \mathbf{c},\tag{2}$$

where c ¼ ½ � c1, c2, ⋯ck <sup>T</sup> <sup>∈</sup> <sup>K</sup> is the coefficients vector. The matrix <sup>D</sup> <sup>¼</sup> ½ � <sup>d</sup>1d2⋯d<sup>K</sup> <sup>∈</sup> <sup>N</sup>�<sup>K</sup> is called dictionary and its columns constitute the dictionary atoms.

With the right choice of dictionary, the coefficients vector c is expected to be sparse, in the sense that its sorted coefficients decay rapidly. Motivated by this idea, the design of efficient complete dictionaries, in which, i.e., K ¼ N was an active area of research during the last decades of twentieth century. The well-known Fourier transform [24], used in the JPEG compression standard [25], and wavelet transform [2], used in JPEG2000 compression standard [26], are the results of these On the Application of Dictionary Learning to Image Compression DOI: http://dx.doi.org/10.5772/intechopen.89114

representation using these fixed mathematical dictionaries is called analysis sparse modelling. The long series of works on designing the analytic dictionaries lead to appearing various transforms such as Fourier and its discrete version, discrete cosine, wavelets, curvelets, contourlets, bandlets, and steerable wavelets.

well adapted to a large class of natural signals.

Coding Theory

compression schemes. Finally, we conclude the chapter.

signal x∈ <sup>N</sup> is represented as a linear combination of K basis,

2. Sparsity-based signal models

K�1

where c ¼ ½ � c1, c2, ⋯ck

tion Eq. (1) can be reformulated in a matrix form as

where set ½ � ck

atoms.

174

A significantly different approach to the sparse modelling, originally introduced by Olshausen and Field [1], consists of learning a dictionary from some training data. The sparse representation using this trained dictionary is called, synthesis sparse modelling. The trained dictionaries, also called synthetic dictionary, could efficiently capture the underlying structures in the natural image patches and are

The analysis and synthesis sparse signal modelling has led to the design effective algorithms for many image-processing applications, such as compression [2–8] and solving inverse problems [9–22]. A straightforward application of the sparse signal modelling in the field of image processing has been image compression due to providing a compact representation of the signal. This chapter presents the theoretical and practical aspects of using the dictionary-based sparse signal modelling for image compression. We address one important question: How to efficiently represent an image over a trained dictionary in order to improve the performance of the image compression? Based on this concept, we present two novel image compression methods. Experimental results show that the introduced methods outperform the existing image coding standards, such as JPEG and JPEG2000. The remainder of chapter is organized as follows: It starts with introducing the concept of sparsity in signal processing. The use of sparse representation modelling with respect to trained dictionaries is presented. Next, a brief description of a wellknown and yet effective dictionary learning algorithm, introduced by Aharon et al. [23], is given. The end of chapter is devoted to introducing two generic image

One of the well-known methods in designing of the signal models is linear approximation. In this modelling technique, given a set of vectors <sup>d</sup><sup>k</sup> <sup>∈</sup> <sup>N</sup> � �<sup>K</sup>�<sup>1</sup>

> x ≃ X K�1

½ � <sup>d</sup>1d2⋯d<sup>K</sup> <sup>∈</sup> <sup>N</sup>�<sup>K</sup> is called dictionary and its columns constitute the dictionary

With the right choice of dictionary, the coefficients vector c is expected to be sparse, in the sense that its sorted coefficients decay rapidly. Motivated by this idea, the design of efficient complete dictionaries, in which, i.e., K ¼ N was an active area of research during the last decades of twentieth century. The well-known Fourier transform [24], used in the JPEG compression standard [25], and wavelet transform [2], used in JPEG2000 compression standard [26], are the results of these

k¼0

<sup>k</sup>¼<sup>0</sup> consists of representation coefficients. The signal approxima-

<sup>T</sup> <sup>∈</sup> <sup>K</sup> is the coefficients vector. The matrix <sup>D</sup> <sup>¼</sup>

<sup>k</sup>¼<sup>0</sup> , a

ckdk, (1)

x ≃ Dc, (2)

works which sparsify uniformly the smooth signals. However, this signal modelling loses its optimality in representation of the image signals due to existence of curve singularities (elongated edges) in these types of signals [27]. As an instance, the images encoded by the JPEG2000 standard suffer from the ringing (smoothing) artefacts near edges.

In an attempt to minimize this weakness of the dictionaries, the design of more general over-complete dictionaries, which have more atoms than the dimension of signal, i.e. K>N, has been investigated over the past decades, and is still intensely ongoing. These dictionaries have a more descriptive ability to represent a wide range of interesting signals, in comparison with the invertible complete dictionaries.

#### 2.1 Sparse modelling using over-complete dictionaries

Compared to the complete case, representation with over-complete dictionaries must be more carefully defined. There are two distinct paths for representing a signal using the over-complete dictionaries: analysis path and synthesis path. The analysis sparse modelling relies on the classical basics of the signal modelling in which the representation of the signal is identified as a linear combination of atoms,

$$\mathbf{x} \simeq \mathbf{D} \mathbf{c}\_{\text{op}} \tag{3}$$

where the coefficients vector c<sup>a</sup> is obtained via the inner products of the signal and the dictionary <sup>c</sup><sup>a</sup> <sup>¼</sup> <sup>Ω</sup><sup>x</sup> <sup>¼</sup> <sup>D</sup><sup>T</sup>x. This method has the advantage of providing a simple and efficient way to achieve sparse representation over the dictionary. In this case, every signal has an unique representation as a linear combination of the dictionary atoms.

Increasing sparsity, in order to obtain a well-defined representation, requires departure from this linear representation towards a more flexible and non-linear representation. Each signal is represented using a different set of atoms from a pool, called dictionary, in order to achieve the best sparsity. Thus, the approximation process becomes

$$\mathbf{x} \simeq \mathbf{D} \mathbf{c}\_{\mathfrak{v}} \tag{4}$$

where the coefficients vector c<sup>s</sup> is obtained via a non-linear approach, in contrast to the linear-based approach in the analysis path. This signal modelling approach, called synthesis sparse representation, needs further refinement to find the welldefined representation due to degrees of freedom identified by the null-space of D [27], which leads to a non-unique choice of c<sup>s</sup> in Eq. (4), as opposed to the analysis sparse modelling which has a unique solution. In order to find the most informative representation, the coefficients vector c<sup>s</sup> is obtained with respect to some cost function Fð Þ� , which minimizes the sparsity of the coefficients vector c<sup>s</sup> under a reconstruction constraint:

$$\mathbf{c}\_{t} = \underset{\mathbf{c} \in \mathbb{R}^{K}}{\text{arg min}} \mathbf{F}(\mathbf{c}) \text{ subject to } \left\| \|\mathbf{x} - \mathbf{D}\mathbf{c} \|\|\_{2}^{2} \le \epsilon,\tag{5}$$

where ϵ is the prior knowledge about the noise level. The penalty function Fð Þ� is defined in a way that is tolerant to the large coefficients and aggressively penalizes the small coefficients [27]. The normal choice for this function is ℓ<sup>p</sup> norm with 0≤ p ≤ 1. Of specific interest is the ℓ<sup>0</sup> case, i.e., F cð Þ¼ ∥c∥0, which counts the number of non-zeros in the representation. For this case, the problem Eq. (5) becomes

$$\mathbf{c}\_{t} = \underset{\mathbf{c} \in \mathbb{R}^{K}}{\text{arg min}} \|\mathbf{c}\|\_{0} \quad \text{subject to} \quad \left\|\mathbf{x} - \mathbf{D}\mathbf{c}\right\|\_{2}^{2} \leq \epsilon. \tag{6}$$

the number of training patches, this method finds a dictionary which either maximizes the likelihood of the training data Pð Þ XjD [35] or the posterior probability of the dictionary Pð Þ DjX [36]. These formulations lead to some optimization problems that are solved in an expectation-maximization fashion, alternating estimation of the sparse representation and the dictionary using gradient descent or similar

In neuroscience area, Olshausen and Field [1] proposed a significantly different

k k <sup>c</sup><sup>i</sup> <sup>0</sup> subject to k k <sup>X</sup> � DC <sup>2</sup>

where ϵ is a fixed small value and C ¼ ½ � c<sup>1</sup> c2⋯c<sup>L</sup> is a matrix of size K � L,

The expression in Eq. (7) is performed iteratively. First, by considering an initial dictionary, the algorithm tries to find the best coefficients matrix C that can be found. Once D is known, the penalty, posed in Eq. (6), reduces to a set of L sparse representation operations, like the ones seen in (Section 2.1). The OMP algorithm

At the next stage, the columns of dictionary are sequentially updated and relevant coefficients in the matrix C are simultaneously changed. At a time, one column is updated and the process of updating one column is based on the singular value decomposition (SVD) on the residual data matrices, computed only on the training samples that use this atom. The K-SVD algorithm includes a mechanism to control and rescale the ℓ2-norm of the dictionary elements. Indeed, without such a mechanism, the norm of D would arbitrarily go to infinity. For more details, refer to [23]. In the image-processing applications, since the size of natural images is too large for learning a full matrix D, it is chosen to learn the dictionary on a set of natural

N <sup>p</sup> � ffiffiffiffi N

integer value and x<sup>i</sup> ∈ <sup>N</sup> is lexicographically stacked representations of the i-th

As mentioned before and also outlined in [27], some of the most important elements of effective dictionary design include localization, multi resolution, and adaptivity. Modern dictionaries typically provide localization in both the analysis and synthesis routes. However, multi-resolution property is usually better supported by the analytic structures, whereas adaptivity is mostly found in the

The most important advantage of the analytic dictionaries is the easy and fast implementation. On the other hands, the main advantage of the trained dictionaries is their ability to provide a much higher degree of specificity to the particular signal properties, allowing them to produce better results in many practical applications such as image compression, feature extraction, content-based image retrieval and others. However, some drawbacks might arise after the compactness introduced by

<sup>p</sup> pixels, where ffiffiffiffi

N <sup>p</sup> is an

[28] is used for the near-optimal calculation of the coefficients matrix C.

<sup>2</sup> ≤ ϵ, (7)

<sup>i</sup>¼<sup>1</sup> of the training samples.

approach for designing the dictionary using the training data, benefiting from modelling the receptive fields of simple cells in the mammalian primary visual cortex. In another attempt, the K-SVD algorithm introduced by Aharon et al. [23] is one of the well-known methods of dictionary learning. Given a set of examples X ¼ ½ � x<sup>1</sup> x2⋯x<sup>L</sup> , the goal of the K-SVD algorithm is to search the best possible dictionary D ∈ <sup>N</sup>�<sup>K</sup> for the sparse representation of the training set X through the

methods.

optimization problem of Eq. (7):

arg min C , D

image patches <sup>X</sup> <sup>¼</sup> ½ � <sup>x</sup><sup>1</sup> <sup>x</sup>2⋯x<sup>L</sup> , each of size ffiffiffiffi

image patches.

synthetic methods.

177

2.3 Analysis versus synthesis

X L

On the Application of Dictionary Learning to Image Compression

DOI: http://dx.doi.org/10.5772/intechopen.89114

i¼1

consisting of the representation coefficients vectors <sup>c</sup><sup>i</sup> ½ �<sup>L</sup>

This problem, known to be NP-hard in general, can be efficiently approximated based on the idea of iterative greedy pursuit. The earliest and yet effective one includes the orthogonal matching pursuit (OMP) [28].

This formulation Eq. (6) has gained a large success beyond the statistics and signal processing communities and has been extensively employed in different signal processing algorithms. In the image-processing applications, since the size of natural images is too large, it is chosen to partition the image into blocks and the sparse modelling is done on the set of image blocks <sup>X</sup> <sup>¼</sup> ½ � <sup>x</sup><sup>1</sup> <sup>x</sup>2⋯xL , each of size ffiffiffiffi N <sup>p</sup> � ffiffiffiffi N <sup>p</sup> pixels, where ffiffiffiffi N <sup>p</sup> is an integer value and <sup>x</sup><sup>i</sup> <sup>∈</sup> <sup>N</sup> is lexicographically stacked representation of the i-th image patches.

#### 2.2 Dictionary choice

In the discussion so far, it is assumed that the dictionaries of the analysis and synthesis models are known. Choosing the dictionary carefully is an important and involving task, in which substantial research has been invested. Based on analysis and synthesis models, the scientific community has developed two main routes for designing the dictionaries [27].

The first one is analytic dictionaries derived from a set of mathematical assumptions made on the family of the signals. The dictionaries of this type are generated by finding the appropriate mathematical functions through harmonic analysis of the interesting signals for which an efficient representation is obtained. For instance, Fourier basis is designed for optimal representation of smooth signals, while the wavelet dictionary is more suitable for piecewise-smooth signals with point singularities.

Designing analytic over-complete dictionaries are formulated as DD<sup>T</sup><sup>x</sup> <sup>¼</sup> <sup>x</sup> for all x. Then, the approach tries to establish an appropriate dictionary by analysing the behaviour of D<sup>T</sup>x and establishing a decay rate. The curvelet [29], contourlet [30], and bandlet [31] transforms are some of the analytic dictionaries which provide comprehensive frameworks in order to handle the multidimensional signals.

Finding the more compact sparse representation has been a major driving force for the continued development of more efficient dictionaries. The synthesis formulation of the sparse representation paved the way to the design of an efficient dictionary, called synthetic dictionaries, from signal realizations via machinelearning techniques. The basic assumption behind this approach is that the structure of the complex natural signals can be more accurately extracted directly from the data than by using a general mathematical model [27]. In fact, this approach replaces prior assumptions on the signal behaviour with a training process which constructs the dictionary based on the observed signal properties. Compared to the analytic dictionaries, the synthetic dictionaries deliver an increased flexibility and the ability to adapt to specific signals and are superior in terms of representation efficiency at the cost of a non-structured and substantially more complex dictionary.

In this approach, a dictionary is trained for the sparse representation of small patches collected from a number of training signals. The desire to efficiently train a dictionary for the sparse representation led to developing some algorithms so far [1, 23, 32–34]. The earlier works on the dictionary learning mostly focused on statistical methods. Given the training image patches X ¼ ½ � x<sup>1</sup> x2⋯x<sup>L</sup> , where L is

#### On the Application of Dictionary Learning to Image Compression DOI: http://dx.doi.org/10.5772/intechopen.89114

c<sup>s</sup> ¼ arg min c∈ <sup>K</sup>

includes the orthogonal matching pursuit (OMP) [28].

N

stacked representation of the i-th image patches.

<sup>p</sup> pixels, where ffiffiffiffi

2.2 Dictionary choice

point singularities.

dimensional signals.

dictionary.

176

designing the dictionaries [27].

N <sup>p</sup> � ffiffiffiffi N

Coding Theory

k k<sup>c</sup> <sup>0</sup> subject to k k <sup>x</sup> � Dc <sup>2</sup>

<sup>p</sup> is an integer value and <sup>x</sup><sup>i</sup> <sup>∈</sup> <sup>N</sup> is lexicographically

This problem, known to be NP-hard in general, can be efficiently approximated

based on the idea of iterative greedy pursuit. The earliest and yet effective one

This formulation Eq. (6) has gained a large success beyond the statistics and signal processing communities and has been extensively employed in different signal processing algorithms. In the image-processing applications, since the size of natural images is too large, it is chosen to partition the image into blocks and the sparse modelling is done on the set of image blocks <sup>X</sup> <sup>¼</sup> ½ � <sup>x</sup><sup>1</sup> <sup>x</sup>2⋯xL , each of size ffiffiffiffi

In the discussion so far, it is assumed that the dictionaries of the analysis and synthesis models are known. Choosing the dictionary carefully is an important and involving task, in which substantial research has been invested. Based on analysis and synthesis models, the scientific community has developed two main routes for

The first one is analytic dictionaries derived from a set of mathematical assumptions made on the family of the signals. The dictionaries of this type are generated by finding the appropriate mathematical functions through harmonic analysis of the interesting signals for which an efficient representation is obtained. For instance, Fourier basis is designed for optimal representation of smooth signals, while the wavelet dictionary is more suitable for piecewise-smooth signals with

Designing analytic over-complete dictionaries are formulated as DD<sup>T</sup><sup>x</sup> <sup>¼</sup> <sup>x</sup> for all x. Then, the approach tries to establish an appropriate dictionary by analysing the behaviour of D<sup>T</sup>x and establishing a decay rate. The curvelet [29], contourlet [30], and bandlet [31] transforms are some of the analytic dictionaries

Finding the more compact sparse representation has been a major driving force for the continued development of more efficient dictionaries. The synthesis formulation of the sparse representation paved the way to the design of an efficient dictionary, called synthetic dictionaries, from signal realizations via machinelearning techniques. The basic assumption behind this approach is that the structure of the complex natural signals can be more accurately extracted directly from the data than by using a general mathematical model [27]. In fact, this approach replaces prior assumptions on the signal behaviour with a training process which constructs the dictionary based on the observed signal properties. Compared to the analytic dictionaries, the synthetic dictionaries deliver an increased flexibility and the ability to adapt to specific signals and are superior in terms of representation efficiency at the cost of a non-structured and substantially more complex

In this approach, a dictionary is trained for the sparse representation of small patches collected from a number of training signals. The desire to efficiently train a dictionary for the sparse representation led to developing some algorithms so far [1, 23, 32–34]. The earlier works on the dictionary learning mostly focused on statistical methods. Given the training image patches X ¼ ½ � x<sup>1</sup> x2⋯x<sup>L</sup> , where L is

which provide comprehensive frameworks in order to handle the multi-

<sup>2</sup> ≤ ϵ: (6)

the number of training patches, this method finds a dictionary which either maximizes the likelihood of the training data Pð Þ XjD [35] or the posterior probability of the dictionary Pð Þ DjX [36]. These formulations lead to some optimization problems that are solved in an expectation-maximization fashion, alternating estimation of the sparse representation and the dictionary using gradient descent or similar methods.

In neuroscience area, Olshausen and Field [1] proposed a significantly different approach for designing the dictionary using the training data, benefiting from modelling the receptive fields of simple cells in the mammalian primary visual cortex. In another attempt, the K-SVD algorithm introduced by Aharon et al. [23] is one of the well-known methods of dictionary learning. Given a set of examples X ¼ ½ � x<sup>1</sup> x2⋯x<sup>L</sup> , the goal of the K-SVD algorithm is to search the best possible dictionary D ∈ <sup>N</sup>�<sup>K</sup> for the sparse representation of the training set X through the optimization problem of Eq. (7):

$$\underset{\mathbf{C},\mathbf{D}}{\text{arg min}} \sum\_{i=1}^{L} ||\mathbf{c}\_{i}||\_{0} \quad subject \text{ to} \quad ||\mathbf{X} - \mathbf{D}\mathbf{C}||\_{2}^{2} \leq \epsilon,\tag{7}$$

where ϵ is a fixed small value and C ¼ ½ � c<sup>1</sup> c2⋯c<sup>L</sup> is a matrix of size K � L, consisting of the representation coefficients vectors <sup>c</sup><sup>i</sup> ½ �<sup>L</sup> <sup>i</sup>¼<sup>1</sup> of the training samples. The expression in Eq. (7) is performed iteratively. First, by considering an initial dictionary, the algorithm tries to find the best coefficients matrix C that can be found. Once D is known, the penalty, posed in Eq. (6), reduces to a set of L sparse representation operations, like the ones seen in (Section 2.1). The OMP algorithm [28] is used for the near-optimal calculation of the coefficients matrix C.

At the next stage, the columns of dictionary are sequentially updated and relevant coefficients in the matrix C are simultaneously changed. At a time, one column is updated and the process of updating one column is based on the singular value decomposition (SVD) on the residual data matrices, computed only on the training samples that use this atom. The K-SVD algorithm includes a mechanism to control and rescale the ℓ2-norm of the dictionary elements. Indeed, without such a mechanism, the norm of D would arbitrarily go to infinity. For more details, refer to [23].

In the image-processing applications, since the size of natural images is too large for learning a full matrix D, it is chosen to learn the dictionary on a set of natural image patches <sup>X</sup> <sup>¼</sup> ½ � <sup>x</sup><sup>1</sup> <sup>x</sup>2⋯x<sup>L</sup> , each of size ffiffiffiffi N <sup>p</sup> � ffiffiffiffi N <sup>p</sup> pixels, where ffiffiffiffi N <sup>p</sup> is an integer value and x<sup>i</sup> ∈ <sup>N</sup> is lexicographically stacked representations of the i-th image patches.

#### 2.3 Analysis versus synthesis

As mentioned before and also outlined in [27], some of the most important elements of effective dictionary design include localization, multi resolution, and adaptivity. Modern dictionaries typically provide localization in both the analysis and synthesis routes. However, multi-resolution property is usually better supported by the analytic structures, whereas adaptivity is mostly found in the synthetic methods.

The most important advantage of the analytic dictionaries is the easy and fast implementation. On the other hands, the main advantage of the trained dictionaries is their ability to provide a much higher degree of specificity to the particular signal properties, allowing them to produce better results in many practical applications such as image compression, feature extraction, content-based image retrieval and others. However, some drawbacks might arise after the compactness introduced by synthesis scheme. In this case, only if a few atoms for presenting a signal, the importance of all atoms largely varies. Consequently, any wrong choice of one atom could potentially lead to additional erroneous atoms that are selected as compensation, deviating further from the desired reconstruction. This weakness is usually appeared in the ℓ0-norm-based non-convex optimization problem of Eq. (6). The convex relaxation approaches from ℓ<sup>0</sup> to ℓ<sup>1</sup> are more stable for the sparse representation at the expense of computational complexity. In the analysis formulation, however, all atoms take an equal part in describing the signal, thus minimizing the dependence on each individual one, and stabilizing the recovery process.

In this section, an adaptive sparse representation approach is presented. It is based on this fact that the visual significance, called visual saliency, of each image patch varies with its location within the image [41]. In other words, the human visual system (HVS) usually focuses on some parts of the image (salient regions), while other parts of the image have a lower level of visual interest. Therefore, designing an adaptive sparse representation scheme by considering the HVS characteristics plays an important role in designing an efficient image compression algorithm.

Figure 1 presents the block diagram of the dictionary learning-based image coding (DLC) framework [7]. The DLC mainly has four main parts, pre-processing, dictionary learning, adaptive sparse representation and entropy coding. First, at the pre-processing step, the input image is divided into L non-overlapping image

of size N. As other coding algorithms, the mean values of image patches (DC

<sup>i</sup>¼<sup>1</sup> and AC components <sup>Y</sup> <sup>¼</sup> <sup>y</sup><sup>i</sup>

N � K learned by the K-SVD dictionary learning algorithm [23]. As other dictionary-based image compression methods, the trained dictionary should be shared between encoder and decoder. The OMP algorithm is used for sparse representation step in an adaptive manner using the visual saliency information. Incorporating this adaptive sparse representation step into the image coding framework aims to further reduce the reconstructed errors. Finally, the DC elements M ¼

The AC components are represented with respect to a trained dictionary D of size

denotes the coefficients vector of the i-th image patch. In the following sections, the adaptive sparse representation and entropy coding steps are detailed. We ignore the details of decoder. However, it should be noted that the decoder can retrieve the

The sparse representation step has a strong impact on effectively encoding of

image patches and thus the rate-distortion performance. Graph-based visual saliency (GBVS) model, proposed in [42], has an capability to extract the saliency map of an image, i.e. locations within an image where have a high visual interest to a human observer. By incorporating this model into the sparse representation step, we build up an image coding scheme which compresses the image more efficiently than the traditional coders. Please refer to [42] for the details of the GVBS model. This saliency map of the image is exploited to determine the visual significance of the image patches in order to allocate different sparsity levels to each patch

<sup>i</sup>¼<sup>1</sup>. <sup>x</sup><sup>i</sup> <sup>∈</sup> <sup>N</sup> represents a <sup>B</sup> � <sup>B</sup> image patch vectorized into a vector

<sup>L</sup>

<sup>i</sup>¼<sup>1</sup> are separately encoded.

<sup>i</sup>¼<sup>1</sup> are entropy coded, where <sup>c</sup><sup>i</sup> <sup>∈</sup> <sup>K</sup>

3.1 Dictionary learning-based image codec

DOI: http://dx.doi.org/10.5772/intechopen.89114

On the Application of Dictionary Learning to Image Compression

<sup>i</sup>¼<sup>1</sup> and representation coefficients <sup>C</sup> <sup>¼</sup> <sup>c</sup><sup>i</sup> ½ �<sup>L</sup>

image by a minor application of the above steps.

according to its visual significance to the HVS.

Block diagram of the dictionary-based image coding (DLC) framework.

3.1.1 Adaptive synthesis sparse representation

patches <sup>X</sup> <sup>¼</sup> <sup>x</sup><sup>i</sup> ½ �<sup>L</sup>

mi ½ �<sup>L</sup>

Figure 1.

179

components) <sup>M</sup> <sup>¼</sup> mi ½ �<sup>L</sup>

#### 3. Image compression

Reducing the cost for storage or transmission of image signals with negligible degradation in the quality is the main goal of an lossy image compression algorithm. It tries to remove the redundancies among the image data by adopting different ways, such as transferring the image into a transform domain with compressible coefficients. In the transform domain, a few significant coefficients capture a large part of the image information. A typical lossy image compression algorithm usually encodes these significant transform coefficients to reduce the requirements for image storage or transmission. The analysis and synthesis sparse modellings are two powerful tools for transforming the image into a compressible domain. The JPEG [25] and JPEG2000 standards [26] are the results of using the analysis sparse representation of the image by designing analytic dictionaries, e.g. discrete cosine transform (DCT) and discrete wavelet transform (DWT), respectively. Since analytic dictionaries-based image sparse representation is typically over-simplistic, it fails to represent the high-textured images efficiently [27]. Due to this weakness of analysis sparse modelling in the efficient expressiveness, an extensive body of literature has recently focused on various applications of the synthesis sparse signal modelling via a trained dictionary. In this way, the performance can be significantly improved for the image compression application, benefiting the sparse representation of the image over a dictionary specifically adapted to it.

In order to improve the limitations of the traditional sparse representation approaches over a trained dictionary, several studies have been proposed in the literature [37–39]. In [37], the authors train a set of dictionaries. In order to improve the compression performance, the sparse representation is achieved by choosing an optimal dictionary among the trained dictionaries. In [38], a set of dictionaries in a tree structure is trained. At each level of tree, a dictionary is learned via generating the image residual. The residual is obtained by difference between the original and recovered images using the trained dictionary at the previous level of tree. Sparse representation is done by selection of one atom from each dictionary at each tree level. The total number of atoms is determined according to the sparsity level. Authors in [39] introduce the concept of multi-sample sparse representation (MSR) and incorporate it into the dictionary learning process. Each image patch is encoded with a certain sparsity level. To do this purpose, multiple neighbouring image patches are considered during the sparse representation to explore different sparsity levels. Based on this concept, an MSR-based image coding approach is proposed in [39].

Using a trained dictionary, each image is represented over the dictionary. The coefficients vector can be obtained via different algorithms such as the basis pursuit algorithms, matching pursuit techniques and other schemes [40]. These conventional approaches usually consider constant sparsity level, i.e. a fixed number of dictionary atoms is considered for representing all the image patches with different characteristics. This approach leads to a weak image compression performance.

In this section, an adaptive sparse representation approach is presented. It is based on this fact that the visual significance, called visual saliency, of each image patch varies with its location within the image [41]. In other words, the human visual system (HVS) usually focuses on some parts of the image (salient regions), while other parts of the image have a lower level of visual interest. Therefore, designing an adaptive sparse representation scheme by considering the HVS characteristics plays an important role in designing an efficient image compression algorithm.

#### 3.1 Dictionary learning-based image codec

synthesis scheme. In this case, only if a few atoms for presenting a signal, the importance of all atoms largely varies. Consequently, any wrong choice of one atom could potentially lead to additional erroneous atoms that are selected as compensation, deviating further from the desired reconstruction. This weakness is usually appeared in the ℓ0-norm-based non-convex optimization problem of Eq. (6). The convex relaxation approaches from ℓ<sup>0</sup> to ℓ<sup>1</sup> are more stable for the sparse representation at the expense of computational complexity. In the analysis formulation, however, all atoms take an equal part in describing the signal, thus minimizing the

dependence on each individual one, and stabilizing the recovery process.

tion of the image over a dictionary specifically adapted to it.

In order to improve the limitations of the traditional sparse representation approaches over a trained dictionary, several studies have been proposed in the literature [37–39]. In [37], the authors train a set of dictionaries. In order to improve the compression performance, the sparse representation is achieved by choosing an optimal dictionary among the trained dictionaries. In [38], a set of dictionaries in a tree structure is trained. At each level of tree, a dictionary is learned via generating the image residual. The residual is obtained by difference between the original and recovered images using the trained dictionary at the previous level of tree. Sparse representation is done by selection of one atom from each dictionary at each tree level. The total number of atoms is determined according to the sparsity level. Authors in [39] introduce the concept of multi-sample sparse representation (MSR) and incorporate it into the dictionary learning process. Each image patch is encoded with a certain sparsity level. To do this purpose, multiple neighbouring image patches are considered during the sparse representation to explore different sparsity levels. Based on this concept, an MSR-based image coding approach is proposed in [39]. Using a trained dictionary, each image is represented over the dictionary. The coefficients vector can be obtained via different algorithms such as the basis pursuit algorithms, matching pursuit techniques and other schemes [40]. These conventional approaches usually consider constant sparsity level, i.e. a fixed number of dictionary atoms is considered for representing all the image patches with different characteristics. This approach leads to a weak image compression performance.

Reducing the cost for storage or transmission of image signals with negligible degradation in the quality is the main goal of an lossy image compression algorithm. It tries to remove the redundancies among the image data by adopting different ways, such as transferring the image into a transform domain with compressible coefficients. In the transform domain, a few significant coefficients capture a large part of the image information. A typical lossy image compression algorithm usually encodes these significant transform coefficients to reduce the requirements for image storage or transmission. The analysis and synthesis sparse modellings are two powerful tools for transforming the image into a compressible domain. The JPEG [25] and JPEG2000 standards [26] are the results of using the analysis sparse representation of the image by designing analytic dictionaries, e.g. discrete cosine transform (DCT) and discrete wavelet transform (DWT), respectively. Since analytic dictionaries-based image sparse representation is typically over-simplistic, it fails to represent the high-textured images efficiently [27]. Due to this weakness of analysis sparse modelling in the efficient expressiveness, an extensive body of literature has recently focused on various applications of the synthesis sparse signal modelling via a trained dictionary. In this way, the performance can be significantly improved for the image compression application, benefiting the sparse representa-

3. Image compression

Coding Theory

178

Figure 1 presents the block diagram of the dictionary learning-based image coding (DLC) framework [7]. The DLC mainly has four main parts, pre-processing, dictionary learning, adaptive sparse representation and entropy coding. First, at the pre-processing step, the input image is divided into L non-overlapping image patches <sup>X</sup> <sup>¼</sup> <sup>x</sup><sup>i</sup> ½ �<sup>L</sup> <sup>i</sup>¼<sup>1</sup>. <sup>x</sup><sup>i</sup> <sup>∈</sup> <sup>N</sup> represents a <sup>B</sup> � <sup>B</sup> image patch vectorized into a vector of size N. As other coding algorithms, the mean values of image patches (DC components) <sup>M</sup> <sup>¼</sup> mi ½ �<sup>L</sup> <sup>i</sup>¼<sup>1</sup> and AC components <sup>Y</sup> <sup>¼</sup> <sup>y</sup><sup>i</sup> <sup>L</sup> <sup>i</sup>¼<sup>1</sup> are separately encoded. The AC components are represented with respect to a trained dictionary D of size N � K learned by the K-SVD dictionary learning algorithm [23]. As other dictionary-based image compression methods, the trained dictionary should be shared between encoder and decoder. The OMP algorithm is used for sparse representation step in an adaptive manner using the visual saliency information. Incorporating this adaptive sparse representation step into the image coding framework aims to further reduce the reconstructed errors. Finally, the DC elements M ¼ mi ½ �<sup>L</sup> <sup>i</sup>¼<sup>1</sup> and representation coefficients <sup>C</sup> <sup>¼</sup> <sup>c</sup><sup>i</sup> ½ �<sup>L</sup> <sup>i</sup>¼<sup>1</sup> are entropy coded, where <sup>c</sup><sup>i</sup> <sup>∈</sup> <sup>K</sup> denotes the coefficients vector of the i-th image patch. In the following sections, the adaptive sparse representation and entropy coding steps are detailed. We ignore the details of decoder. However, it should be noted that the decoder can retrieve the image by a minor application of the above steps.

#### 3.1.1 Adaptive synthesis sparse representation

The sparse representation step has a strong impact on effectively encoding of image patches and thus the rate-distortion performance. Graph-based visual saliency (GBVS) model, proposed in [42], has an capability to extract the saliency map of an image, i.e. locations within an image where have a high visual interest to a human observer. By incorporating this model into the sparse representation step, we build up an image coding scheme which compresses the image more efficiently than the traditional coders. Please refer to [42] for the details of the GVBS model. This saliency map of the image is exploited to determine the visual significance of the image patches in order to allocate different sparsity levels to each patch according to its visual significance to the HVS.

Figure 1. Block diagram of the dictionary-based image coding (DLC) framework.

#### Coding Theory

It should be noted that we normalize the saliency map values to the range 0½ � , 1 . Therefore, the intensity of each pixel of this normalized saliency map stands for the probability belonging that pixel to a salient region. Some examples of saliency maps are shown in Figure 2. In the second row, brighter regions represent the salient locations which a human observer pays more attention to, while the darker areas represent the less-saliency regions.

To obtain the saliency value of each image patch, first, the saliency map is partitioned into non-overlapping blocks of size B � B pixels. Then, the average of the saliency values of pixels belonging to each block is considered as the saliency value of the corresponding patch. Let Hi denotes the saliency value of i-th image patch. Since saliency value of each image patch varies with its position within the image; therefore, by assigning different sparsity level to each block according to its saliency value leads to an improvement in the rate-distortion performance.

Consider the sparsity level of i-th image patch as:

$$\mathbf{S}\_{i} = a\_{i}\mathbf{S}.\tag{8}$$

neighbouring patches i and i � 1 (here we set m<sup>0</sup> as 0). Instead of encoding M ¼

<sup>i</sup>¼<sup>1</sup> directly, the residuals between subsequent DC elements, i.e. <sup>E</sup> <sup>¼</sup> ei ½ �<sup>L</sup>

tizing and encoding the nonzero coefficients of <sup>C</sup> <sup>¼</sup> <sup>c</sup><sup>i</sup> ½ �<sup>L</sup>

On the Application of Dictionary Learning to Image Compression

DOI: http://dx.doi.org/10.5772/intechopen.89114

fact is that the non-zero coefficients in <sup>C</sup> <sup>¼</sup> <sup>c</sup><sup>i</sup> ½ �<sup>L</sup>

quantized and encoded. The quantization is achieved via round eð Þ <sup>i</sup>=b where b is a constant value. A dead-zone quantizer is used for quantization of the residuals [43]. In order to further remove the redundancy, the Huffman coder is used to entropy-code the quantized DC values. During the entropy coding step, pre-defined code-word tables are used built offline. The same procedure is employed for quan-

A big part of the bit-stream generated by the above scheme is occupied by encoding the indices of the representation coefficients, I. The reason behind this

efficiently encode this random pattern, a quad-tree splitting algorithm is employed. First, each vector c<sup>i</sup> is divided into two equal parts. At the next step, a binary evaluation is achieved on each part: if each part consists of one nonzero coefficient, the encoder outputs 1; otherwise, the encoder sends 0. Each part including one nonzero coefficient at least, is then divided into two separate parts and the binary evaluation is again performed. This process continues until the maximum depth of partitioning is reached. In comparison with other methods like fixed length coding [39], this quad-tree splitting algorithm encodes the indices of nonzero coefficients

In a basic image compression approach, a down-sampling strategy is employed at the encoder side, as another lossy operation. Then, at the decoder side, an upscaling algorithm is employed to reconstruct the high resolution (HR) image. In this technique, the high-frequency details within the image are removed by employing the down-sampling operator before the quantization lossy operator. Thus low frequency information within an image, where contain most of the energy image, is encoded by higher bit-rate. This bit allocation scheme leads to the better ratedistortion performance at the low bit-rates. Although the down-sampling operator improves the rate-distortion performance at low bit-rates, it eliminates the high frequency (HF) details. These information should be reproduced in order to recover

In the DLC encoder, presented in previous section, the quantization and sparse representation are the main compression tools. In this section, the down-sampling operator is incorporated into the DLC codec as another lossy operator to further remove redundancies existing within an image. Instead of encoding the whole image, its low-resolution (LR) version is encoded. At the decoder side, an upscaling algorithm is carefully designed to recover the high-frequency details. As discussed before, this strategy enhances the coding efficiency at the low bit-rates. However, designing an efficient up-scaling algorithm is an important step at the decoder side by which the high frequency information can be correctly recovered. A failure in designing a good up-scaling algorithm leads to a weak rate-distortion performance at the high bit-rates. The presented up-scaling scheme in this section recovers the HR image by encoding and sending the residual image, i.e. difference between the original image and the up-scaled one, as side information. The encoder generates the residual image and encodes it by the sparse representation of the residual image over a trained dictionary. Decoder recovers the final image by combination of the LR decoded image and the reconstructed residual. Both dictionaries used for the sparse representation of LR residual images are trained by a bilevel dictionary learning algorithm. Further, an image analyser is designed and

i¼1.

<sup>i</sup>¼<sup>1</sup> have a random structure. To

<sup>i</sup>¼<sup>1</sup> are

mi ½ �<sup>L</sup>

more efficient.

181

3.2 Down-sampling-based codec

a high quality image at the higher bit-rates.

Given the target sparsity level S, we aim to allocate a different sparsity level Si to the i-th image patch. It should be noted that the average of sparsity levels of all image patches must be equal (or slightly inferior) to the target sparsity level S. The factor α<sup>i</sup> can be easily obtained via:

$$\alpha\_i = \frac{L \times H\_i}{\sum\_{i=1}^{L} H\_i}.\tag{9}$$

where Hi ½ �<sup>L</sup> <sup>i</sup>¼<sup>1</sup> denotes the set of saliency values. As a result, a set of sparsity levels Si ½ �<sup>L</sup> <sup>i</sup>¼<sup>1</sup> is obtained via Eq. (8). Based on these new obtained sparsity levels, the sparse representation of each patch xi over the dictionary D is achieved by:

$$\underset{\mathbf{c}\_{i}}{\operatorname{argmin}} \ \|\mathbf{x}\_{i} - \mathbf{D}\mathbf{c}\_{i}\|\_{2} \\ \text{Subject to } \|\mathbf{c}\_{i}\|\_{0} \leq \mathbf{S}\_{i}. \tag{10}$$

The OMP method in [43] is used to solve this problem. By assigning different sparsity level Si to each block, a more effective sparse representation of the image is obtained.

#### 3.1.2 Quantization step and entropy coding

Differential pulse-coded modulation (DPCM) scheme is used to quantize the DC elements. Let ei ¼ mi � mi�<sup>1</sup> denotes the residual between DC values of two

Figure 2. The original images and their saliency maps.

neighbouring patches i and i � 1 (here we set m<sup>0</sup> as 0). Instead of encoding M ¼ mi ½ �<sup>L</sup> <sup>i</sup>¼<sup>1</sup> directly, the residuals between subsequent DC elements, i.e. <sup>E</sup> <sup>¼</sup> ei ½ �<sup>L</sup> <sup>i</sup>¼<sup>1</sup> are quantized and encoded. The quantization is achieved via round eð Þ <sup>i</sup>=b where b is a constant value. A dead-zone quantizer is used for quantization of the residuals [43].

In order to further remove the redundancy, the Huffman coder is used to entropy-code the quantized DC values. During the entropy coding step, pre-defined code-word tables are used built offline. The same procedure is employed for quantizing and encoding the nonzero coefficients of <sup>C</sup> <sup>¼</sup> <sup>c</sup><sup>i</sup> ½ �<sup>L</sup> i¼1.

A big part of the bit-stream generated by the above scheme is occupied by encoding the indices of the representation coefficients, I. The reason behind this fact is that the non-zero coefficients in <sup>C</sup> <sup>¼</sup> <sup>c</sup><sup>i</sup> ½ �<sup>L</sup> <sup>i</sup>¼<sup>1</sup> have a random structure. To efficiently encode this random pattern, a quad-tree splitting algorithm is employed. First, each vector c<sup>i</sup> is divided into two equal parts. At the next step, a binary evaluation is achieved on each part: if each part consists of one nonzero coefficient, the encoder outputs 1; otherwise, the encoder sends 0. Each part including one nonzero coefficient at least, is then divided into two separate parts and the binary evaluation is again performed. This process continues until the maximum depth of partitioning is reached. In comparison with other methods like fixed length coding [39], this quad-tree splitting algorithm encodes the indices of nonzero coefficients more efficient.

#### 3.2 Down-sampling-based codec

It should be noted that we normalize the saliency map values to the range 0½ � , 1 . Therefore, the intensity of each pixel of this normalized saliency map stands for the probability belonging that pixel to a salient region. Some examples of saliency maps are shown in Figure 2. In the second row, brighter regions represent the salient locations which a human observer pays more attention to, while the darker areas

To obtain the saliency value of each image patch, first, the saliency map is partitioned into non-overlapping blocks of size B � B pixels. Then, the average of the saliency values of pixels belonging to each block is considered as the saliency value of the corresponding patch. Let Hi denotes the saliency value of i-th image patch. Since saliency value of each image patch varies with its position within the image; therefore, by assigning different sparsity level to each block according to its

saliency value leads to an improvement in the rate-distortion performance.

Given the target sparsity level S, we aim to allocate a different sparsity level Si to

the i-th image patch. It should be noted that the average of sparsity levels of all image patches must be equal (or slightly inferior) to the target sparsity level S. The

> <sup>α</sup><sup>i</sup> <sup>¼</sup> <sup>L</sup> � Hi P<sup>L</sup> <sup>i</sup>¼<sup>1</sup>Hi

<sup>i</sup>¼<sup>1</sup> is obtained via Eq. (8). Based on these new obtained sparsity levels, the sparse

The OMP method in [43] is used to solve this problem. By assigning different sparsity level Si to each block, a more effective sparse representation of the image is

Differential pulse-coded modulation (DPCM) scheme is used to quantize the DC

elements. Let ei ¼ mi � mi�<sup>1</sup> denotes the residual between DC values of two

representation of each patch xi over the dictionary D is achieved by:

argmin ci

3.1.2 Quantization step and entropy coding

The original images and their saliency maps.

<sup>i</sup>¼<sup>1</sup> denotes the set of saliency values. As a result, a set of sparsity levels

k k x<sup>i</sup> � Dc<sup>i</sup> <sup>2</sup> Subject To k k c<sup>i</sup> <sup>0</sup> ≤ Si: (10)

Si ¼ αiS: (8)

: (9)

Consider the sparsity level of i-th image patch as:

represent the less-saliency regions.

Coding Theory

factor α<sup>i</sup> can be easily obtained via:

where Hi ½ �<sup>L</sup>

Si ½ �<sup>L</sup>

obtained.

Figure 2.

180

In a basic image compression approach, a down-sampling strategy is employed at the encoder side, as another lossy operation. Then, at the decoder side, an upscaling algorithm is employed to reconstruct the high resolution (HR) image. In this technique, the high-frequency details within the image are removed by employing the down-sampling operator before the quantization lossy operator. Thus low frequency information within an image, where contain most of the energy image, is encoded by higher bit-rate. This bit allocation scheme leads to the better ratedistortion performance at the low bit-rates. Although the down-sampling operator improves the rate-distortion performance at low bit-rates, it eliminates the high frequency (HF) details. These information should be reproduced in order to recover a high quality image at the higher bit-rates.

In the DLC encoder, presented in previous section, the quantization and sparse representation are the main compression tools. In this section, the down-sampling operator is incorporated into the DLC codec as another lossy operator to further remove redundancies existing within an image. Instead of encoding the whole image, its low-resolution (LR) version is encoded. At the decoder side, an upscaling algorithm is carefully designed to recover the high-frequency details. As discussed before, this strategy enhances the coding efficiency at the low bit-rates. However, designing an efficient up-scaling algorithm is an important step at the decoder side by which the high frequency information can be correctly recovered. A failure in designing a good up-scaling algorithm leads to a weak rate-distortion performance at the high bit-rates. The presented up-scaling scheme in this section recovers the HR image by encoding and sending the residual image, i.e. difference between the original image and the up-scaled one, as side information. The encoder generates the residual image and encodes it by the sparse representation of the residual image over a trained dictionary. Decoder recovers the final image by combination of the LR decoded image and the reconstructed residual. Both dictionaries used for the sparse representation of LR residual images are trained by a bilevel dictionary learning algorithm. Further, an image analyser is designed and

incorporated into the encoder. The goal of this image analyser is for designing an adaptive sparse representation in order to assign higher bit-rate to the salient parts of the image. Experimental results illustrate that this down-sampling-based image compression scheme based on sparse representation (DCSR) achieves better ratedistortion performance when compared with the conventional codecs, such as JPEG and JPEG2000, and the DLC codec, described in Section 3.1.

denote the coefficients vector of the i-block in the LR image X<sup>L</sup> and the residual

At the decoder side, the image is reconstructed by a minor application of the above steps. First, the reconstructed LR image <sup>X</sup>^ <sup>L</sup> is up-scaled by the up-scaling operator to restore the image <sup>X</sup>^ <sup>H</sup> of size of the original image. Then, the final

The main goal of image analyser is to enhance the performance of the DCSR encoder. It reduces the reconstructed error in some parts of the image which have higher visual interest to a human observer. Therefore, the image analyser extracts the salient regions within the image according to its visual significance to the HVS. Then, it allocates higher rates to the salient regions by assigning different sparsity levels to the image patches, located within the salient parts, in both LR and residual images. As mentioned before, the GBVS model [42] is used to obtain the salient regions within an image. Some examples of saliency maps are shown in Figure 2. For finding the sparsity levels of the image patches in the LR image, the obtained saliency map is re-sized to the same size of the LR image. Then, it is divided into B � B blocks to obtain the corresponding saliency value H<sup>i</sup> of the i-th block by taking the average of the saliency values of pixels within that block. The goal is assign a different sparsity level Si to each block. Note that the average of the sparsity levels assigned to all blocks within the image should be equal (or slightly inferior) to the target sparsity level S. Given the target sparsity level S and the set of saliency

<sup>i</sup>¼<sup>1</sup>, the sparsity level of the <sup>i</sup>-th block is obtained by:

<sup>i</sup> <sup>¼</sup> TL � Hi PTL <sup>i</sup>¼<sup>1</sup>Hi

where TL is the number of blocks in the LR image. Based on this strategy, different sparsity levels are allocated for the sparse representation of the image

representation of the LR patch x<sup>i</sup> in the LR image over the dictionary D<sup>L</sup> is achieved

saliency map of the original image, as described before. The sparse representation of each patch ei in residual image with respect to the dictionary D<sup>E</sup> is achieved by:

The OMP method is used to solve these problems. At the low bit-rates, DCSR encodes only the LR image. At the higher bit-rates, DCSR encodes the residual image and send it as side information in order to improve the coding efficiency. In

<sup>i</sup> ∥2, s:t: ∥c<sup>L</sup>

<sup>i</sup> ∥2, s:t: ∥c<sup>E</sup>

SL

<sup>∥</sup>x<sup>i</sup> � <sup>D</sup><sup>L</sup>c<sup>L</sup>

The same procedure is applied to obtain the sparsity levels S<sup>E</sup>

<sup>∥</sup>e<sup>i</sup> � <sup>D</sup><sup>E</sup>c<sup>E</sup>

patches in the residual image. The set of saliency values H<sup>i</sup> ½ �TE

patches in the LR image. Given the obtained sparsity levels S<sup>L</sup>

argmin cL i

argmin eE i

<sup>þ</sup> <sup>E</sup>^, where <sup>E</sup>^ denotes the decoded residual

S, (11)

i � �TL i¼1

> i � �TE

<sup>i</sup> ∥<sup>0</sup> ≤ S<sup>L</sup>

<sup>i</sup> ∥<sup>0</sup> ≤ S<sup>E</sup>

, the sparse

<sup>i</sup> : (12)

<sup>i</sup>¼<sup>1</sup> for the image

<sup>i</sup>¼<sup>1</sup> is obtained using the

<sup>i</sup> : (13)

image E, respectively.

image.

values <sup>H</sup><sup>i</sup> ½ �TL

by [44]:

183

restored image is obtained via <sup>X</sup><sup>D</sup> <sup>¼</sup> <sup>X</sup>^ <sup>H</sup>

DOI: http://dx.doi.org/10.5772/intechopen.89114

3.2.1 Image analyser for adaptive sparse representation

On the Application of Dictionary Learning to Image Compression

The block diagram of the DCSR encoder is shown in Figure 3. This framework mainly consists of five parts, including down-sampling, up-scaling, sparse representation, quantization and entropy coding [8]. An image analyser is also introduced to the core of codec. At the pre-processing step, the down-sampling operator uses a blurring convolution kernel and a simple decimator by three to produce the LR image with only 1/9 of total pixels of the original image. At the up-scaling step, the down-sampled image is restored back to its original resolution. The goal of the up-scaling step is designing an accurate algorithm to improve the rate-distortion performance at the higher bit-rates. A joint sparse representation method is employed to solve this ill-posed, complex inverse problem. This method guarantees to restore the high frequency details which are enough and sufficient for reconstruction of the HR image.

Assume the LR image, X<sup>L</sup>, is obtained via <sup>X</sup><sup>L</sup> <sup>¼</sup> HX<sup>H</sup>, where <sup>H</sup> is the downsampling operator and X<sup>H</sup> denotes the HR image. After the image recovery via the up-scaling algorithm, the relationship between the HR and the reconstructed images is described as <sup>X</sup><sup>H</sup> <sup>¼</sup> <sup>X</sup><sup>~</sup> <sup>H</sup> þ E, where E is the residual image. The LR image X<sup>L</sup> and the residual image E are separately partitioned into non-overlapping image patches of size <sup>B</sup> � <sup>B</sup>. Let us assume <sup>x</sup><sup>i</sup> ½ �TL <sup>i</sup>¼<sup>1</sup> and <sup>e</sup><sup>i</sup> ½ �TE <sup>i</sup>¼<sup>1</sup> denote the vectorized image blocks in the X<sup>L</sup> and E, respectively, where TL and TE are the number of patches in the <sup>X</sup><sup>L</sup> and <sup>E</sup>, respectively. Moreover, note that TL <sup>¼</sup> <sup>1</sup>=9TE.

At the sparse representation step, the LR image patches <sup>x</sup><sup>i</sup> ½ �TL <sup>i</sup>¼<sup>1</sup> are sparsely represented over an over-complete dictionary <sup>D</sup><sup>L</sup> of size <sup>N</sup> � <sup>K</sup> using the OMP algorithm. An image analyser is carefully designed to use the visual saliency information at the sparse representation step. The goal of the analyser is to further reduce the reconstructed errors in the areas within the image where have higher visual interest to the HVS. The same procedure is also applied for the sparse representation of the image patches of the residual image <sup>e</sup><sup>i</sup> ½ �TE <sup>i</sup>¼<sup>1</sup>. A different overcomplete dictionary <sup>D</sup><sup>E</sup> of size <sup>N</sup> � <sup>K</sup> is used for the sparse representation of the residual patches. The two dictionaries D<sup>L</sup> and D<sup>E</sup> are trained offline using a bi-level dictionary learning algorithm. It is assumed that these dictionaries are shared between the encoder and decoder. Finally, the obtained sparse coefficients of LR patches and residual patches, i.e. <sup>C</sup><sup>L</sup> <sup>¼</sup> <sup>c</sup><sup>L</sup> i TL <sup>i</sup>¼<sup>1</sup> and <sup>C</sup><sup>E</sup> <sup>¼</sup> <sup>c</sup><sup>E</sup> i TE <sup>i</sup>¼<sup>1</sup> respectively, are quantized and entropy coded in order to obtain the bit-stream. c<sup>L</sup> <sup>i</sup> ∈ <sup>K</sup> and e<sup>L</sup> <sup>i</sup> ∈ <sup>K</sup>

Figure 3. Block diagram of the DCSR codec.

incorporated into the encoder. The goal of this image analyser is for designing an adaptive sparse representation in order to assign higher bit-rate to the salient parts of the image. Experimental results illustrate that this down-sampling-based image compression scheme based on sparse representation (DCSR) achieves better ratedistortion performance when compared with the conventional codecs, such as JPEG

The block diagram of the DCSR encoder is shown in Figure 3. This framework mainly consists of five parts, including down-sampling, up-scaling, sparse representation, quantization and entropy coding [8]. An image analyser is also introduced to the core of codec. At the pre-processing step, the down-sampling operator uses a blurring convolution kernel and a simple decimator by three to produce the LR image with only 1/9 of total pixels of the original image. At the up-scaling step, the down-sampled image is restored back to its original resolution. The goal of the up-scaling step is designing an accurate algorithm to improve the rate-distortion performance at the higher bit-rates. A joint sparse representation method is

employed to solve this ill-posed, complex inverse problem. This method guarantees to restore the high frequency details which are enough and sufficient for recon-

Assume the LR image, X<sup>L</sup>, is obtained via <sup>X</sup><sup>L</sup> <sup>¼</sup> HX<sup>H</sup>, where <sup>H</sup> is the downsampling operator and X<sup>H</sup> denotes the HR image. After the image recovery via the up-scaling algorithm, the relationship between the HR and the reconstructed

X<sup>L</sup> and the residual image E are separately partitioned into non-overlapping image

blocks in the X<sup>L</sup> and E, respectively, where TL and TE are the number of patches in

represented over an over-complete dictionary <sup>D</sup><sup>L</sup> of size <sup>N</sup> � <sup>K</sup> using the OMP algorithm. An image analyser is carefully designed to use the visual saliency information at the sparse representation step. The goal of the analyser is to further reduce the reconstructed errors in the areas within the image where have higher visual interest to the HVS. The same procedure is also applied for the sparse repre-

complete dictionary <sup>D</sup><sup>E</sup> of size <sup>N</sup> � <sup>K</sup> is used for the sparse representation of the residual patches. The two dictionaries D<sup>L</sup> and D<sup>E</sup> are trained offline using a bi-level dictionary learning algorithm. It is assumed that these dictionaries are shared between the encoder and decoder. Finally, the obtained sparse coefficients of LR

> i TL

<sup>i</sup>¼<sup>1</sup> and <sup>C</sup><sup>E</sup> <sup>¼</sup> <sup>c</sup><sup>E</sup>

i TE

<sup>i</sup>¼<sup>1</sup> and <sup>e</sup><sup>i</sup> ½ �TE

þ E, where E is the residual image. The LR image

<sup>i</sup>¼<sup>1</sup> denote the vectorized image

<sup>i</sup>¼<sup>1</sup>. A different over-

<sup>i</sup>¼<sup>1</sup> are sparsely

<sup>i</sup>¼<sup>1</sup> respectively, are

<sup>i</sup> ∈ <sup>K</sup> and e<sup>L</sup>

<sup>i</sup> ∈ <sup>K</sup>

and JPEG2000, and the DLC codec, described in Section 3.1.

struction of the HR image.

Coding Theory

images is described as <sup>X</sup><sup>H</sup> <sup>¼</sup> <sup>X</sup><sup>~</sup> <sup>H</sup>

patches of size <sup>B</sup> � <sup>B</sup>. Let us assume <sup>x</sup><sup>i</sup> ½ �TL

patches and residual patches, i.e. <sup>C</sup><sup>L</sup> <sup>¼</sup> <sup>c</sup><sup>L</sup>

Figure 3.

182

Block diagram of the DCSR codec.

the <sup>X</sup><sup>L</sup> and <sup>E</sup>, respectively. Moreover, note that TL <sup>¼</sup> <sup>1</sup>=9TE.

sentation of the image patches of the residual image <sup>e</sup><sup>i</sup> ½ �TE

quantized and entropy coded in order to obtain the bit-stream. c<sup>L</sup>

At the sparse representation step, the LR image patches <sup>x</sup><sup>i</sup> ½ �TL

denote the coefficients vector of the i-block in the LR image X<sup>L</sup> and the residual image E, respectively.

At the decoder side, the image is reconstructed by a minor application of the above steps. First, the reconstructed LR image <sup>X</sup>^ <sup>L</sup> is up-scaled by the up-scaling operator to restore the image <sup>X</sup>^ <sup>H</sup> of size of the original image. Then, the final restored image is obtained via <sup>X</sup><sup>D</sup> <sup>¼</sup> <sup>X</sup>^ <sup>H</sup> <sup>þ</sup> <sup>E</sup>^, where <sup>E</sup>^ denotes the decoded residual image.

#### 3.2.1 Image analyser for adaptive sparse representation

The main goal of image analyser is to enhance the performance of the DCSR encoder. It reduces the reconstructed error in some parts of the image which have higher visual interest to a human observer. Therefore, the image analyser extracts the salient regions within the image according to its visual significance to the HVS. Then, it allocates higher rates to the salient regions by assigning different sparsity levels to the image patches, located within the salient parts, in both LR and residual images. As mentioned before, the GBVS model [42] is used to obtain the salient regions within an image. Some examples of saliency maps are shown in Figure 2.

For finding the sparsity levels of the image patches in the LR image, the obtained saliency map is re-sized to the same size of the LR image. Then, it is divided into B � B blocks to obtain the corresponding saliency value H<sup>i</sup> of the i-th block by taking the average of the saliency values of pixels within that block. The goal is assign a different sparsity level Si to each block. Note that the average of the sparsity levels assigned to all blocks within the image should be equal (or slightly inferior) to the target sparsity level S. Given the target sparsity level S and the set of saliency values <sup>H</sup><sup>i</sup> ½ �TL <sup>i</sup>¼<sup>1</sup>, the sparsity level of the <sup>i</sup>-th block is obtained by:

$$\mathbf{S}\_i^L = \frac{T\_L \times H\_i}{\sum\_{i=1}^{T\_L} H\_i} \mathbf{S}\_i \tag{11}$$

where TL is the number of blocks in the LR image. Based on this strategy, different sparsity levels are allocated for the sparse representation of the image patches in the LR image. Given the obtained sparsity levels S<sup>L</sup> i � �TL i¼1 , the sparse representation of the LR patch x<sup>i</sup> in the LR image over the dictionary D<sup>L</sup> is achieved by [44]:

$$\underset{\mathbf{c}\_{i}^{L}}{\operatorname{argmin}} \|\mathbf{x}\_{i} - \mathbf{D}^{L}\mathbf{c}\_{i}^{L}\|\_{\mathcal{D}} \quad \text{s.t.} \; \|\mathbf{c}\_{i}^{L}\|\_{0} \leq \mathbf{S}\_{i}^{L} \,. \tag{12}$$

The same procedure is applied to obtain the sparsity levels S<sup>E</sup> i � �TE <sup>i</sup>¼<sup>1</sup> for the image patches in the residual image. The set of saliency values H<sup>i</sup> ½ �TE <sup>i</sup>¼<sup>1</sup> is obtained using the saliency map of the original image, as described before. The sparse representation of each patch ei in residual image with respect to the dictionary D<sup>E</sup> is achieved by:

$$\underset{\mathbf{e}\_{i}^{E}}{\operatorname{argmin}} \|\mathbf{e}\_{i} - \mathbf{D}^{E}\mathbf{c}\_{i}^{E}\|\_{2}, \text{ s.t. } \|\mathbf{c}\_{i}^{E}\|\_{0} \leq \mathbf{S}\_{i}^{E}.\tag{13}$$

The OMP method is used to solve these problems. At the low bit-rates, DCSR encodes only the LR image. At the higher bit-rates, DCSR encodes the residual image and send it as side information in order to improve the coding efficiency. In this case, the patches within the residual and LR images are represented over the corresponding dictionaries with the same target sparsity level S.

Note that the trained dictionary D<sup>L</sup> and the up-scaling operator are used to estimate

Differential pulse-coded modulation (DPCM) is used to quantize the non-zero elements in the sparse representation vectors C<sup>L</sup> and C<sup>E</sup>. A dead-zone quantizer is used for quantization. Further, the classical Huffman coding is employed to entropy-code the quantized sparse coefficients. Predefined code-word tables, needed for Huffman encoder, are constructed offline and stored at both encoder

The location of non-zero coefficients in C<sup>L</sup> and C<sup>E</sup> follow a random pattern. Therefore, encoding the indices of these coefficients occupies a big part output bitstream. For efficiently encoding these indices I, the quad-tree splitting algorithm,

In this section, the rate-distortion performance of the DCSR codec is examined by performing a suite of experiments on a set of 8-bit grey-scale standard images. All evaluated images are re-sized to size of 528 528 pixels in order to produce the LR image with only 1/9 of total pixels of the original image. The well-known peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) index are chosen as a measure in the experiments. Further, the rate distortion performance is compared with the JPEG and JPEG2000 standards, as well as the DLC codec, presented in

Block size is set as 8 8 for encoding the LR and residual images. Further, two dictionaries D<sup>L</sup> and D<sup>E</sup> of size 64 440 are trained using the bi-level dictionary learning algorithm presented in Section 3.2.3. We use the images from the CVG-Granada dataset<sup>1</sup> for training. All these training images are partitioned into 8 <sup>8</sup> image patches and 12,000 ones are randomly selected for training. One hundred epochs, each processing 12,000 training vectors, are considered during training process. For up-scaling, the dictionary pair of size 25 256 and the mapping matrix of size 256 256 are trained using the algorithm introduced in Section 3.2.2. The regularization parameters γ, λ, and λ<sup>m</sup> are empirically set as 0.1, 0.1, and 0.01,

Next, we present the rate-distortion graph for the test images in Figures 4 and 5. Note that we also provide the results for different baseline algorithms, i.e. JPEG,<sup>2</sup>

a large part of signal energy. The remained energy of residual is captured by representing the residual over another trained dictionary. This residual is encoded as side information to improve the quality of the restored image at the higher bitrates. The bi-level dictionary learning algorithm is achieved by the following steps: first, the training images are down-sampled by a factor 3. Second, these LR images are employed to train a dictionary D<sup>L</sup> by the K-SVD algorithm [23]. Third, the LR training images are represented with respect to the trained dictionary DL. Forth, the reconstructed LR images are then up-scaled by the algorithm described in previous section. Fifth, these up-scaled images are used to create the training residual images which are subsequently used to train the second dictionary D<sup>R</sup> by the K-SVD

On the Application of Dictionary Learning to Image Compression

DOI: http://dx.doi.org/10.5772/intechopen.89114

algorithm.

and decoder sides.

Section 3.1.

respectively.

<sup>2</sup> http://www.ijg.org

185

<sup>1</sup> 1http://decsai.ugr.es/cvg/dbimagenes

3.3 Experimental results

3.2.4 Quantization and entropy coding

explained in Section 3.1.2, is employed.

#### 3.2.2 Joint dictionary-mapping learning for up-scaling

The goal of up-scaling algorithm is to recover the high frequency information eliminated during the down-sampling operation. One well-known up-scaling approach is to learn a function which maps the LR and HR image patches. This mapping function can be obtained using two training databases of the LR and HR image patches. Instead of finding the direct mapping function between the LR and HR patches, authors in [44] show that the mapping function can be efficiently learned in the transform domain. In this approach, first, the LR and HR patches are transferred into the representation space over the trained dictionaries and then the relationship between representation coefficients is learned.

Let two sets X and Y consist of N training LR and corresponding HR image patches, respectively. Suppose D<sup>x</sup> ∈ <sup>M</sup>�<sup>K</sup> and D ∈ My�<sup>K</sup> denote the trained dictionaries for the sparse representation of the LR and HR patches. These two dictionaries are trained jointly such that the sparse representation of all training LR and HR patches are mapped to each other via a linear function M ∈ <sup>K</sup>�<sup>K</sup>. The dictionaries and mapping matrix are obtained by solving the following minimization problem:

$$\begin{aligned} \underset{\mathbf{D}\_{\mathcal{X}}, \mathbf{D}\_{\mathcal{Y}}, \mathbf{A}\_{\mathcal{Y}}, \mathbf{A}\_{\mathcal{Y}}}{\arg\min} & \quad \|\mathbf{X} - \mathbf{D}\_{\mathcal{X}}\mathbf{A}\_{\mathcal{X}}\|\_{2}^{2} + \left\|\left\|\mathbf{Y} - \mathbf{D}\_{\mathcal{Y}}\mathbf{A}\_{\mathcal{Y}}\right\|\_{2}^{2} + \gamma\left\|\mathbf{A}\_{\mathbf{x}} - \mathbf{M}\mathbf{A}\_{\mathcal{Y}}\right\|\_{2}^{2} \\ & \quad + \lambda\left\|\mathbf{A}\_{\mathbf{x}}\right\|\_{1} + \lambda\left\|\mathbf{A}\_{\mathcal{Y}}\right\|\_{1} + \lambda\_{m}\left\|\mathbf{M}\right\|\_{2}^{2} \end{aligned} \tag{14}$$

where Λ<sup>x</sup> ∈ <sup>K</sup>�<sup>N</sup> and Λ<sup>y</sup> ∈ <sup>K</sup>�<sup>N</sup> represent the corresponding sparse representation matrices. The terms γ, λ and λ<sup>m</sup> denote regularization parameters. As proposed in [45], the minimization problem Eq. (14) is solved in an iterative approach with respect to a variable while all the other variables are considered as fixed parameters.

After training process, the up-scaling is achieved by the dictionary pair (i.e. D<sup>x</sup> and Dy) and the mapping matrix M. First, the LR image is divided into overlapping blocks of size Bs � Bs. Then, the sparse representation vector α<sup>y</sup> of an LR patch y is obtained with respect to the trained dictionary Dy. In the minimization problem Eq. (14), we use ℓ1-norm to train the dictionaries. Given these trained dictionaries, the up-scaled algorithm uses ℓ0-norm, instead of ℓ<sup>1</sup> norm, in order to improve the rate-distortion performance. Thus, the sparse representation of LR patch α<sup>y</sup> is obtained by solving the following minimization problem:

$$\mathfrak{a}\_{\mathcal{Y}} = \underset{\mathfrak{a}}{\text{arg min}} \Big| \left\| \mathbf{y} - \mathbf{D}\_{\mathcal{Y}} \mathbf{a} \right\|\_{2}^{2} + \delta \|\mathfrak{a}\|\_{0}. \tag{15}$$

where δ denotes the regularization parameter. At the next step, we derive α^<sup>x</sup> ¼ Mα<sup>y</sup> and restore the HR patch via x^ ¼ Dxα^x. At the last step, the restored up-scaled image <sup>X</sup>^ <sup>H</sup> is obtained by replacing the recovered image patches into the whole image grid and taking the average over the overlapped pixels. For more details about solving the above optimization problems, please refer to [44].

#### 3.2.3 Bi-level dictionary learning algorithm for image compression

The up-scaling algorithm, described in previous section, leads to a bi-level dictionary learning algorithm which used for image compression at the DCSR encoder.

#### On the Application of Dictionary Learning to Image Compression DOI: http://dx.doi.org/10.5772/intechopen.89114

Note that the trained dictionary D<sup>L</sup> and the up-scaling operator are used to estimate a large part of signal energy. The remained energy of residual is captured by representing the residual over another trained dictionary. This residual is encoded as side information to improve the quality of the restored image at the higher bitrates. The bi-level dictionary learning algorithm is achieved by the following steps: first, the training images are down-sampled by a factor 3. Second, these LR images are employed to train a dictionary D<sup>L</sup> by the K-SVD algorithm [23]. Third, the LR training images are represented with respect to the trained dictionary DL. Forth, the reconstructed LR images are then up-scaled by the algorithm described in previous section. Fifth, these up-scaled images are used to create the training residual images which are subsequently used to train the second dictionary D<sup>R</sup> by the K-SVD algorithm.

#### 3.2.4 Quantization and entropy coding

this case, the patches within the residual and LR images are represented over the

The goal of up-scaling algorithm is to recover the high frequency information eliminated during the down-sampling operation. One well-known up-scaling approach is to learn a function which maps the LR and HR image patches. This mapping function can be obtained using two training databases of the LR and HR image patches. Instead of finding the direct mapping function between the LR and HR patches, authors in [44] show that the mapping function can be efficiently learned in the transform domain. In this approach, first, the LR and HR patches are transferred into the representation space over the trained dictionaries and then the

Let two sets X and Y consist of N training LR and corresponding HR image patches, respectively. Suppose D<sup>x</sup> ∈ <sup>M</sup>�<sup>K</sup> and D ∈ My�<sup>K</sup> denote the trained dictionaries for the sparse representation of the LR and HR patches. These two dictionaries are trained jointly such that the sparse representation of all training LR and HR patches are mapped to each other via a linear function M ∈ <sup>K</sup>�<sup>K</sup>. The dictionaries and mapping matrix are obtained by solving the following minimization problem:

> <sup>2</sup> þ Y � DyΛ<sup>y</sup>

> >

where Λ<sup>x</sup> ∈ <sup>K</sup>�<sup>N</sup> and Λ<sup>y</sup> ∈ <sup>K</sup>�<sup>N</sup> represent the corresponding sparse representation matrices. The terms γ, λ and λ<sup>m</sup> denote regularization parameters. As proposed in [45], the minimization problem Eq. (14) is solved in an iterative approach with respect to a variable while all the other variables are considered as fixed parameters. After training process, the up-scaling is achieved by the dictionary pair (i.e. D<sup>x</sup> and Dy) and the mapping matrix M. First, the LR image is divided into overlapping blocks of size Bs � Bs. Then, the sparse representation vector α<sup>y</sup> of an LR patch y is obtained with respect to the trained dictionary Dy. In the minimization problem Eq. (14), we use ℓ1-norm to train the dictionaries. Given these trained dictionaries, the up-scaled algorithm uses ℓ0-norm, instead of ℓ<sup>1</sup> norm, in order to improve the rate-distortion performance. Thus, the sparse representation of LR patch α<sup>y</sup> is

> <sup>y</sup> � <sup>D</sup>y<sup>α</sup>

where δ denotes the regularization parameter. At the next step, we derive α^<sup>x</sup> ¼ Mα<sup>y</sup> and restore the HR patch via x^ ¼ Dxα^x. At the last step, the restored up-scaled image <sup>X</sup>^ <sup>H</sup> is obtained by replacing the recovered image patches into the whole image grid and taking the average over the overlapped pixels. For more details

The up-scaling algorithm, described in previous section, leads to a bi-level dictionary learning algorithm which used for image compression at the DCSR encoder.

 2

 2

<sup>1</sup> <sup>þ</sup> <sup>λ</sup>mk k <sup>M</sup> <sup>2</sup>

<sup>2</sup> þ γ Λ<sup>x</sup> � MΛ<sup>y</sup> 

2,

 2 2

<sup>2</sup> þ δk kα <sup>0</sup>, (15)

(14)

corresponding dictionaries with the same target sparsity level S.

relationship between representation coefficients is learned.

k k <sup>X</sup> � <sup>D</sup>xΛ<sup>x</sup> <sup>2</sup>

obtained by solving the following minimization problem:

<sup>α</sup><sup>y</sup> <sup>¼</sup> arg min <sup>α</sup>

about solving the above optimization problems, please refer to [44].

3.2.3 Bi-level dictionary learning algorithm for image compression

þλk k Λ<sup>x</sup> <sup>1</sup> þ λ Λ<sup>y</sup>

arg min Dx, D<sup>y</sup> ,Λx,<sup>Λ</sup><sup>y</sup> , M

Coding Theory

184

3.2.2 Joint dictionary-mapping learning for up-scaling

Differential pulse-coded modulation (DPCM) is used to quantize the non-zero elements in the sparse representation vectors C<sup>L</sup> and C<sup>E</sup>. A dead-zone quantizer is used for quantization. Further, the classical Huffman coding is employed to entropy-code the quantized sparse coefficients. Predefined code-word tables, needed for Huffman encoder, are constructed offline and stored at both encoder and decoder sides.

The location of non-zero coefficients in C<sup>L</sup> and C<sup>E</sup> follow a random pattern. Therefore, encoding the indices of these coefficients occupies a big part output bitstream. For efficiently encoding these indices I, the quad-tree splitting algorithm, explained in Section 3.1.2, is employed.

#### 3.3 Experimental results

In this section, the rate-distortion performance of the DCSR codec is examined by performing a suite of experiments on a set of 8-bit grey-scale standard images. All evaluated images are re-sized to size of 528 528 pixels in order to produce the LR image with only 1/9 of total pixels of the original image. The well-known peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) index are chosen as a measure in the experiments. Further, the rate distortion performance is compared with the JPEG and JPEG2000 standards, as well as the DLC codec, presented in Section 3.1.

Block size is set as 8 8 for encoding the LR and residual images. Further, two dictionaries D<sup>L</sup> and D<sup>E</sup> of size 64 440 are trained using the bi-level dictionary learning algorithm presented in Section 3.2.3. We use the images from the CVG-Granada dataset<sup>1</sup> for training. All these training images are partitioned into 8 <sup>8</sup> image patches and 12,000 ones are randomly selected for training. One hundred epochs, each processing 12,000 training vectors, are considered during training process. For up-scaling, the dictionary pair of size 25 256 and the mapping matrix of size 256 256 are trained using the algorithm introduced in Section 3.2.2. The regularization parameters γ, λ, and λ<sup>m</sup> are empirically set as 0.1, 0.1, and 0.01, respectively.

Next, we present the rate-distortion graph for the test images in Figures 4 and 5. Note that we also provide the results for different baseline algorithms, i.e. JPEG,<sup>2</sup>

<sup>1</sup> 1http://decsai.ugr.es/cvg/dbimagenes

<sup>2</sup> http://www.ijg.org

#### Figure 4.

Rate-distortion performance of the DCSR codec compared to JPEG, JPEG2000 and DLC codecs in terms of PSNR for several test images (size 528 528, grey-level).

JPEG2000,<sup>3</sup> and DLC algorithm [12]. As these figures shows DLC and DCSR codecs provide a better performance (in terms of PSNR and SSIM) than the available image coding standards JPEG and JPEG2000. Interestingly, as the Figure 4 demonstrates,

the DCSR algorithm enhances the quality of the image over the JPEG2000 codec. The enhancement depends on the statistics of the image, so different enhancement is observed in different images. Please note that SSIM shows the perceived quality

Rate-distortion performance of the DCSR codec compared to JPEG and DLC codecs in terms of MSSIM for

Figure 5.

187

several test images (size 528 528, grey-level).

On the Application of Dictionary Learning to Image Compression

DOI: http://dx.doi.org/10.5772/intechopen.89114

<sup>3</sup> http://www.openjpeg.org

On the Application of Dictionary Learning to Image Compression DOI: http://dx.doi.org/10.5772/intechopen.89114

#### Figure 5.

JPEG2000,<sup>3</sup> and DLC algorithm [12]. As these figures shows DLC and DCSR codecs provide a better performance (in terms of PSNR and SSIM) than the available image coding standards JPEG and JPEG2000. Interestingly, as the Figure 4 demonstrates,

Rate-distortion performance of the DCSR codec compared to JPEG, JPEG2000 and DLC codecs in terms of

Figure 4.

Coding Theory

186

<sup>3</sup> http://www.openjpeg.org

PSNR for several test images (size 528 528, grey-level).

Rate-distortion performance of the DCSR codec compared to JPEG and DLC codecs in terms of MSSIM for several test images (size 528 528, grey-level).

the DCSR algorithm enhances the quality of the image over the JPEG2000 codec. The enhancement depends on the statistics of the image, so different enhancement is observed in different images. Please note that SSIM shows the perceived quality

performance for images coded at low bit-rate. In higher bit-rates, the efficiency of coding is enhanced after the residual image is encoded as side information.

On the Application of Dictionary Learning to Image Compression

DOI: http://dx.doi.org/10.5772/intechopen.89114

A brief summary of the signal modelling methodology has been given at the first part of chapter. We continued with the explicit and straightforward formulation of the sparse representation being more suitable for the compression tasks. We have just focused on the synthesis-based signal modelling because of being mature of the image compression using the analysis-based sparse signal modelling. An adaptive sparse representation over a trained over-complete dictionary was presented to compress the images. More specifically, given the saliency map of the image to be encoded, an image patch could be well represented with the linear combination of atoms selected from an over-complete and trained dictionary based on the sparsity level. Finally, at the end part of this chapter, a down-sampling operation is incorporated into the codes in order to improve the compression performance. The experimental results demonstrated that the presented image compression frameworks outperform image coding standards, such as JPEG and JPEG2000, which use

4. Conclusions

an analytic dictionary.

Author details

189

Ali Akbari1,2\* and Maria Trocan2

1 University of Surrey, Guildford, UK

provided the original work is properly cited.

2 Institut supérieur d'électronique de Paris, Paris, France

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium,

\*Address all correspondence to: ali.akbari@isep.fr

Figure 6. Compressed images at bit-rate (0.15) using JPEG (left) and DCSR (right) scheme.

of the compressed image, therefore, SSIM curves for DCSR and JPEG2000 are very close to each other. For a better visualization, we removed the rate-distortion performance for JPEG2000 in Figure 5. As the trained dictionary performs a better capture and presents the contours more accurate, DCSR codec presents a higher performance compared to the analytic dictionaries such as the wavelet transform basis. Moreover, a 0.2 dBs improvement is observed using the DLC algorithm, marginally better for simpler images, like Mandrill.

Several compressed images are shown in Figure 6 for visual comparisons. All images are coded with the same bit-rate (0.15) using JPEG and DCSR algorithm. As it can be seen, the JPEG standard fails to reconstruct images compressed at very low bit rates. In comparison, DCSR algorithm can preserve more image details. It is more effective in reconstruction of both the smooth area and the complex regions, including texture and edges, leading to visually much more pleasant recovery.

As a consequence, the down-sampling-based coding results in the highfrequency details in the image before performing the quantization lossy, hence larger number of bits are assigned low frequency information occupying more energy in the image. The presented bit assigning strategy brings a higher

performance for images coded at low bit-rate. In higher bit-rates, the efficiency of coding is enhanced after the residual image is encoded as side information.

### 4. Conclusions

A brief summary of the signal modelling methodology has been given at the first part of chapter. We continued with the explicit and straightforward formulation of the sparse representation being more suitable for the compression tasks. We have just focused on the synthesis-based signal modelling because of being mature of the image compression using the analysis-based sparse signal modelling. An adaptive sparse representation over a trained over-complete dictionary was presented to compress the images. More specifically, given the saliency map of the image to be encoded, an image patch could be well represented with the linear combination of atoms selected from an over-complete and trained dictionary based on the sparsity level. Finally, at the end part of this chapter, a down-sampling operation is incorporated into the codes in order to improve the compression performance. The experimental results demonstrated that the presented image compression frameworks outperform image coding standards, such as JPEG and JPEG2000, which use an analytic dictionary.

### Author details

of the compressed image, therefore, SSIM curves for DCSR and JPEG2000 are very close to each other. For a better visualization, we removed the rate-distortion performance for JPEG2000 in Figure 5. As the trained dictionary performs a better capture and presents the contours more accurate, DCSR codec presents a higher performance compared to the analytic dictionaries such as the wavelet transform basis. Moreover, a 0.2 dBs improvement is observed using the DLC algorithm,

Several compressed images are shown in Figure 6 for visual comparisons. All images are coded with the same bit-rate (0.15) using JPEG and DCSR algorithm. As it can be seen, the JPEG standard fails to reconstruct images compressed at very low bit rates. In comparison, DCSR algorithm can preserve more image details. It is more effective in reconstruction of both the smooth area and the complex regions, including texture and edges, leading to visually much more pleasant recovery. As a consequence, the down-sampling-based coding results in the highfrequency details in the image before performing the quantization lossy, hence larger number of bits are assigned low frequency information occupying more energy in the image. The presented bit assigning strategy brings a higher

marginally better for simpler images, like Mandrill.

Compressed images at bit-rate (0.15) using JPEG (left) and DCSR (right) scheme.

Figure 6.

Coding Theory

188

Ali Akbari1,2\* and Maria Trocan2


\*Address all correspondence to: ali.akbari@isep.fr

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### References

[1] Olshausen A, Field DJ. Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision Research. 1997;16(4):3311-3325

[2] Mallat S. A Wavelet Tour of Signal Processing. 3rd ed. Academic Press; 2008

[3] Pennec EL, Mallat S. Bandelet image approximation and compression. SIAM Multiscale Modeling & Simulation. 2005;4(3):992-1039

[4] Villegas OOV, Elias RP, Villela PR, Sanchez VGC, Salazar AM. Edging out the competition: Lossy image coding with wavelets and contourlets. IEEE Potentials. 2008;27(2):39-44

[5] Wang S, Shu Z, Zhang L, Liu G, Gan L. Iterative image coding with overcomplete curvelet transform. In: Proceedings of IEEE Congress on Image and Signal Processing (CISP), vol. 1; May 2008; Hainan, China. pp. 666-670

[6] Beferull-Lozano B, Ortega A. Coding techniques for oversampled steerable transforms. In: Proceedings of Asilomar Conference on Signals, Systems, and Computers, vol. 2; October 1999; Pacific Grove, CA. pp. 1198-1202

[7] Akbari A, Mandache D, Trocan M, Granado B. Adaptive saliency-based compressive sensing image reconstruction. In: Proceedings of IEEE International Conference on Multimedia Expo Workshops (ICMEW); July 2016; Seattle, WA. pp. 1-6

[8] Akbari A, Trocan M. Downsampling based image coding using dual dictionary learning and sparse representations. In: Proceedings of IEEE International Workshop on Multimedia Signal Processing (MMSP); August 2018. pp. 1-5

[9] Akbari A, Trocan M. Sparse recovery-based error concealment for multiview images. In: Proceedings of IEEE International Workshop on Computational Intelligence for Multimedia Understanding (IWCIM); October 2015; Prague, Czech Republic. pp. 1-5

concealment. IEEE Transactions on Multimedia. 2017;19(6):1339-1350

DOI: http://dx.doi.org/10.5772/intechopen.89114

On the Application of Dictionary Learning to Image Compression

measurement matrices. In: 2018 25th IEEE International Conference on Electronics, Circuits and Systems (ICECS); December 2018. pp. 659-660

[23] Aharon M, Elad M, Bruckstein A. K-

SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing. 2006;54(11):

[24] Ahmed N, Natarajan T, Rao KR. Discrete cosine transform. IEEE Transactions on Computers. 1974;C-23

[25] G. K. Wallace, "The jpeg still picture

Transactions on Consumer Electronics, vol. 38, no. 1, pp. xviii-xxxiv, 1992

compression standard," IEEE

[26] Skodras A, Christopoulos C, Ebrahimi T. The jpeg 2000 still image compression standard. IEEE Signal Processing Magazine. 2001;18(5):36-58

[27] Rubinstein R, Bruckstein AM, Elad M. Dictionaries for sparse

the IEEE. 2010;98(6):1045-1057

[28] Pati YC, Rezaiifar R,

pursuit: Recursive function

representation modeling. Proceedings of

Krishnaprasad PS. Orthogonal matching

approximation with applications to wavelet decomposition. In: Proceedings of Asilomar Conference on Signals, Systems and Computers; November 1993; Pacific Grove, CA. pp. 40-44

[29] Candés E, Demanet L, Donoho D,

transforms. SIAM Multiscale Modeling & Simulation. 2006;5(3):861-899

[30] Do MN, Vetterli M. Contourlets: A new directional multiresolution image representation. In: Proceedings of Asilomar Conference on Signals, Systems and Computers, vol. 1; November 2002; Pacific Grove, CA.

Ying L. Fast discrete curvelet

pp. 497-501

4311-4322

(1):90-93

[16] Akbari A, Trocan M, Granado B. Joint-domain dictionary learning-based error concealment using common space mapping. In: Proceedings of IEEE International Conference on Digital Signal Processing (DSP); August 2017.

[17] Akbari A, Trocan M, Granado B. Image error concealment based on joint sparse representation and non-local similarity. In: Proceedings of IEEE Global Conference on Signal and Information Processing (GlobalSIP);

[18] Akbari A, Trocan M, Sanei S, Granado B. Joint sparse learning with nonlocal and local image priors for image error concealment. IEEE Transactions on Circuits and Systems

for Video Technology. 2019

Science Publishers; 2018

Technology. 2019:1-1

pp. 1832-1836

191

[19] Akbari A, Trocan M, Granado B. Image compressed sensed recovery using saliency-based adaptive sensing and residual reconstruction. In:

Compressed Sensing: Methods, Theory and Applications. New York: Nova

[20] Akbari A, Trevisi M, Trocan M, Carmona-Galán R. Compressive imaging using rip-compliant cmos imager architecture and landweber reconstruction. IEEE Transactions on Circuits and Systems for Video

[21] Akbari A, Trocan M. Robust image

[22] Akbari A, Trevisi M, Trocan M. Adaptive compressed sensing image

reconstruction using binary

reconstruction for block-based compressed sensing using a binary measurement matrix. In: 2018 25th IEEE International Conference on Image Processing (ICIP); October 2018.

pp. 1-5

November 2017

[10] Akbari A, Trocan M, Granado B. Synthesis sparse modeling: Application to image compression and image error concealment. In: Proceedings of Signal Processing with Adaptive Sparse Structured Representations Workshop (SPARS); June 2017; Lisbon, Portugal

[11] Mandache D, Akbari A, Trocan M, Granado B. Image compressed sensing recovery using intra-block prediction. In: Proceedings of IEEE International Conference on Telecommunications Forum (TELFOR); November 2015; Belgrade, Serbia. pp. 748-751

[12] Akbari A, Trocan M, Granado B. Image compression using adaptive sparse representations over trained dictionaries. In: Proceedings of IEEE International Workshop on Multimedia Signal Processing (MMSP); September 2016; Montreal, Canada. pp. 1-6

[13] Akbari A, Trocan M, Granado B. Image error concealment using sparse representations over a trained dictionary. In: Proceedings of IEEE Picture Coding Symposium (PCS); December 2016; Nuremberg, Germany. pp. 1-5

[14] Akbari A, Trocan M, Granado B. Residual based compressed sensing recovery using sparse representations over a trained dictionary. In: Proceedings of International ITG Conference on Systems, Communications and Coding (SCC); February 2017; Hamburg, Germany. pp. 1-6

[15] Akbari A, Trocan M, Granado B. Sparse recovery-based error

On the Application of Dictionary Learning to Image Compression DOI: http://dx.doi.org/10.5772/intechopen.89114

concealment. IEEE Transactions on Multimedia. 2017;19(6):1339-1350

References

Coding Theory

2008

2005;4(3):992-1039

[1] Olshausen A, Field DJ. Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision Research. 1997;16(4):3311-3325

multiview images. In: Proceedings of IEEE International Workshop on Computational Intelligence for Multimedia Understanding (IWCIM); October 2015; Prague, Czech Republic.

[10] Akbari A, Trocan M, Granado B. Synthesis sparse modeling: Application to image compression and image error concealment. In: Proceedings of Signal Processing with Adaptive Sparse Structured Representations Workshop (SPARS); June 2017; Lisbon, Portugal

[11] Mandache D, Akbari A, Trocan M, Granado B. Image compressed sensing recovery using intra-block prediction. In: Proceedings of IEEE International Conference on Telecommunications Forum (TELFOR); November 2015; Belgrade, Serbia. pp. 748-751

[12] Akbari A, Trocan M, Granado B. Image compression using adaptive sparse representations over trained dictionaries. In: Proceedings of IEEE International Workshop on Multimedia Signal Processing (MMSP); September 2016; Montreal, Canada. pp. 1-6

[13] Akbari A, Trocan M, Granado B. Image error concealment using sparse

[14] Akbari A, Trocan M, Granado B. Residual based compressed sensing recovery using sparse representations

Communications and Coding (SCC); February 2017; Hamburg, Germany.

[15] Akbari A, Trocan M, Granado B.

representations over a trained dictionary. In: Proceedings of IEEE Picture Coding Symposium (PCS); December 2016; Nuremberg, Germany.

over a trained dictionary. In: Proceedings of International ITG

Sparse recovery-based error

Conference on Systems,

pp. 1-5

pp. 1-6

pp. 1-5

[2] Mallat S. A Wavelet Tour of Signal Processing. 3rd ed. Academic Press;

[3] Pennec EL, Mallat S. Bandelet image approximation and compression. SIAM Multiscale Modeling & Simulation.

[4] Villegas OOV, Elias RP, Villela PR, Sanchez VGC, Salazar AM. Edging out the competition: Lossy image coding with wavelets and contourlets. IEEE

Potentials. 2008;27(2):39-44

Grove, CA. pp. 1198-1202

compressive sensing image

Seattle, WA. pp. 1-6

[5] Wang S, Shu Z, Zhang L, Liu G, Gan L. Iterative image coding with overcomplete curvelet transform. In: Proceedings of IEEE Congress on Image and Signal Processing (CISP), vol. 1; May 2008; Hainan, China. pp. 666-670

[6] Beferull-Lozano B, Ortega A. Coding techniques for oversampled steerable transforms. In: Proceedings of Asilomar Conference on Signals, Systems, and Computers, vol. 2; October 1999; Pacific

[7] Akbari A, Mandache D, Trocan M, Granado B. Adaptive saliency-based

reconstruction. In: Proceedings of IEEE International Conference on Multimedia Expo Workshops (ICMEW); July 2016;

[8] Akbari A, Trocan M. Downsampling

representations. In: Proceedings of IEEE International Workshop on Multimedia

based image coding using dual dictionary learning and sparse

Signal Processing (MMSP); August 2018. pp. 1-5

190

[9] Akbari A, Trocan M. Sparse recovery-based error concealment for [16] Akbari A, Trocan M, Granado B. Joint-domain dictionary learning-based error concealment using common space mapping. In: Proceedings of IEEE International Conference on Digital Signal Processing (DSP); August 2017. pp. 1-5

[17] Akbari A, Trocan M, Granado B. Image error concealment based on joint sparse representation and non-local similarity. In: Proceedings of IEEE Global Conference on Signal and Information Processing (GlobalSIP); November 2017

[18] Akbari A, Trocan M, Sanei S, Granado B. Joint sparse learning with nonlocal and local image priors for image error concealment. IEEE Transactions on Circuits and Systems for Video Technology. 2019

[19] Akbari A, Trocan M, Granado B. Image compressed sensed recovery using saliency-based adaptive sensing and residual reconstruction. In: Compressed Sensing: Methods, Theory and Applications. New York: Nova Science Publishers; 2018

[20] Akbari A, Trevisi M, Trocan M, Carmona-Galán R. Compressive imaging using rip-compliant cmos imager architecture and landweber reconstruction. IEEE Transactions on Circuits and Systems for Video Technology. 2019:1-1

[21] Akbari A, Trocan M. Robust image reconstruction for block-based compressed sensing using a binary measurement matrix. In: 2018 25th IEEE International Conference on Image Processing (ICIP); October 2018. pp. 1832-1836

[22] Akbari A, Trevisi M, Trocan M. Adaptive compressed sensing image reconstruction using binary

measurement matrices. In: 2018 25th IEEE International Conference on Electronics, Circuits and Systems (ICECS); December 2018. pp. 659-660

[23] Aharon M, Elad M, Bruckstein A. K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing. 2006;54(11): 4311-4322

[24] Ahmed N, Natarajan T, Rao KR. Discrete cosine transform. IEEE Transactions on Computers. 1974;C-23 (1):90-93

[25] G. K. Wallace, "The jpeg still picture compression standard," IEEE Transactions on Consumer Electronics, vol. 38, no. 1, pp. xviii-xxxiv, 1992

[26] Skodras A, Christopoulos C, Ebrahimi T. The jpeg 2000 still image compression standard. IEEE Signal Processing Magazine. 2001;18(5):36-58

[27] Rubinstein R, Bruckstein AM, Elad M. Dictionaries for sparse representation modeling. Proceedings of the IEEE. 2010;98(6):1045-1057

[28] Pati YC, Rezaiifar R, Krishnaprasad PS. Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition. In: Proceedings of Asilomar Conference on Signals, Systems and Computers; November 1993; Pacific Grove, CA. pp. 40-44

[29] Candés E, Demanet L, Donoho D, Ying L. Fast discrete curvelet transforms. SIAM Multiscale Modeling & Simulation. 2006;5(3):861-899

[30] Do MN, Vetterli M. Contourlets: A new directional multiresolution image representation. In: Proceedings of Asilomar Conference on Signals, Systems and Computers, vol. 1; November 2002; Pacific Grove, CA. pp. 497-501

[31] Pennec EL, Mallat S. Sparse geometric image representations with bandelets. IEEE Transactions on Image Processing. 2005;14(4):423-438

[32] Engan K, Skretting K, Husoy JH. Family of iterative LS-based dictionary learning algorithms, ILS-DLA, for sparse signal representation. Digital Signal Processing. 2007;17(1):32-49

[33] Vidal R, Ma Y, Sastry S. Generalized principal component analysis (GPCA). IEEE Transactions on Pattern Analysis and Machine Intelligence. 2005;27(12): 1945-1959

[34] Sulam J, Ophir B, Zibulevsky M, Elad M. Trainlets: Dictionary learning in high dimensions. IEEE Transactions on Signal Processing. 2016;64(12): 3180-3193

[35] Lewicki MS, Sejnowski TJ. Learning overcomplete representations. Neural Computation. 2000;12(2):337-365

[36] Kreutz-Delgado K, Murray JF, Rao BD, Engan K, Lee TW, Sejnowski TJ. Dictionary learning algorithms for sparse representation. Neural Computation. 2003;15(2): 349-396

[37] Gurumoorthy KS, Rajwade A, Banerjee A, Rangarajan A. A method for compact image representation using sparse matrix and tensor projections onto exemplar orthonormal bases. IEEE Transactions on Image Processing. 2010;19(2):322-334

[38] Mazaheri JA, Guillemot C, Labit C. Learning a tree-structured dictionary for efficient image representation with adaptive sparse coding. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); May 2013; Vancouver, Canada. pp. 1320-1324

[39] Sun Y, Tao X, Li Y, Lu J. Dictionary learning for image coding based on

multisample sparse representation. IEEE Transactions on Circuits and Systems for Video Technology. 2014;24(11): 2004-2010

Chapter 12

Encoder

data hiding algorithms

1. Introduction

193

Abstract

The DICOM Image Compression

using Run Length and Huffman

Maintaining human healthcare is one of the biggest challenges that most of the increasing population in Asian countries are facing today. There is an unrelenting need in our medical community to develop applications that are low on cost, with high compression, as huge number of patient's data and images need to be transmitted over the network to be reviewed by the physicians for diagnostic purpose. This implemented work represents discrete wavelet-based threshold approach. Using this approach by applying N-level decomposition on 2D wavelet types like Biorthogonal, Haar, Daubechies, Coiflets, Symlets, Reverse Biorthogonal, and Discrete Meyer, various levels of wavelet coefficients are obtained. The lossless hybrid encoding algorithm, which combines run-length encoder and Huffman encoder, has been used for compression and decompression purpose. This work is proposed to examine the efficiency of different wavelet types and to determine the best. The objective of this

Keywords: DICOM, discrete wavelet, N-level decomposition, threshold approach,

Digital technology has, in the last few decades, entered in almost every aspect of

medicine. There has been a huge development in noninvasive medical imaging equipment. Since there are multiple medical equipment manufacturers, there is a strong need to develop a standard for storage and exchange of medical images. DICOM (Digital Imaging and Communications in Medicine) makes medical image exchange easier and independent of the imaging equipment manufacturer. The DICOM standard has been developed by ACR-NEMA to meet the needs of manufacturers and users of medical imaging equipment for interconnection of devices on standard networks. The DICOM technology is suitable when sending images between different departments within hospitals and/or other hospitals and the consultant. DICOM file contains both a header, which include text information such as patient's name, modality, image size, etc., and image data in the same file. Hence DICOM standards are widely used in the integration of digital imaging systems in medicine. Figure 1 shows the structure of DICOM image file. It has two main

research work is to improve compression ratio and compression gain.

and Patient Data Integration

Trupti N. Baraskar and Vijay R. Mankar

[40] Peyre G. A review of adaptive image representations. IEEE Journal of Selected Topics in Signal Processing. 2011;5(5):896-911

[41] Borji A, Cheng MM, Jiang H, Li J. Salient object detection: A benchmark. IEEE Transactions on Image Processing. 2015;24(12):5706-5722

[42] Harel J, Koch C, Perona P. Graphbased visual saliency. In: Proceedings of Neural Information Processing Systems (NIPS); 2006. pp. 545-552

[43] Skretting K, Engan K. Image compression using learned dictionaries by rls-dla and compared with K-SVD. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); May 2011; Prague, Czech Republic. pp. 1517-1520

[44] Wang S, Zhang L, Liang Y, Pan Q. Semi-coupled dictionary learning with applications to image super-resolution and photo-sketch synthesis. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR); June 2012; Providence, RI. pp. 2216-2223

[45] Huang DA, Wang YCF. Coupled dictionary and feature space learning with applications to cross-domain image synthesis and recognition. In: Proceedings of IEEE Conference on Computer Vision (ICCV); December 2013; Sydney, Australia. pp. 2496-2503

#### Chapter 12

[31] Pennec EL, Mallat S. Sparse geometric image representations with bandelets. IEEE Transactions on Image Processing. 2005;14(4):423-438

Coding Theory

multisample sparse representation. IEEE Transactions on Circuits and Systems for Video Technology. 2014;24(11):

[40] Peyre G. A review of adaptive image representations. IEEE Journal of Selected Topics in Signal Processing.

[41] Borji A, Cheng MM, Jiang H, Li J. Salient object detection: A benchmark. IEEE Transactions on Image Processing.

[42] Harel J, Koch C, Perona P. Graphbased visual saliency. In: Proceedings of Neural Information Processing Systems

2004-2010

2011;5(5):896-911

2015;24(12):5706-5722

(NIPS); 2006. pp. 545-552

[43] Skretting K, Engan K. Image compression using learned dictionaries by rls-dla and compared with K-SVD. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); May 2011; Prague, Czech Republic. pp. 1517-1520

[44] Wang S, Zhang L, Liang Y, Pan Q. Semi-coupled dictionary learning with applications to image super-resolution and photo-sketch synthesis. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR); June 2012; Providence, RI. pp. 2216-2223

[45] Huang DA, Wang YCF. Coupled dictionary and feature space learning with applications to cross-domain image

synthesis and recognition. In: Proceedings of IEEE Conference on Computer Vision (ICCV); December 2013; Sydney, Australia. pp. 2496-2503

[32] Engan K, Skretting K, Husoy JH. Family of iterative LS-based dictionary learning algorithms, ILS-DLA, for sparse signal representation. Digital Signal Processing. 2007;17(1):32-49

[33] Vidal R, Ma Y, Sastry S. Generalized principal component analysis (GPCA). IEEE Transactions on Pattern Analysis and Machine Intelligence. 2005;27(12):

[34] Sulam J, Ophir B, Zibulevsky M, Elad M. Trainlets: Dictionary learning in high dimensions. IEEE Transactions on

[35] Lewicki MS, Sejnowski TJ. Learning overcomplete representations. Neural Computation. 2000;12(2):337-365

[36] Kreutz-Delgado K, Murray JF, Rao BD, Engan K, Lee TW, Sejnowski TJ. Dictionary learning algorithms for sparse representation. Neural Computation. 2003;15(2):

[37] Gurumoorthy KS, Rajwade A, Banerjee A, Rangarajan A. A method for compact image representation using sparse matrix and tensor projections onto exemplar orthonormal bases. IEEE Transactions on Image Processing.

[38] Mazaheri JA, Guillemot C, Labit C. Learning a tree-structured dictionary for efficient image representation with adaptive sparse coding. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); May 2013; Vancouver,

[39] Sun Y, Tao X, Li Y, Lu J. Dictionary learning for image coding based on

Signal Processing. 2016;64(12):

1945-1959

3180-3193

349-396

2010;19(2):322-334

Canada. pp. 1320-1324

192

## The DICOM Image Compression and Patient Data Integration using Run Length and Huffman Encoder

Trupti N. Baraskar and Vijay R. Mankar

### Abstract

Maintaining human healthcare is one of the biggest challenges that most of the increasing population in Asian countries are facing today. There is an unrelenting need in our medical community to develop applications that are low on cost, with high compression, as huge number of patient's data and images need to be transmitted over the network to be reviewed by the physicians for diagnostic purpose. This implemented work represents discrete wavelet-based threshold approach. Using this approach by applying N-level decomposition on 2D wavelet types like Biorthogonal, Haar, Daubechies, Coiflets, Symlets, Reverse Biorthogonal, and Discrete Meyer, various levels of wavelet coefficients are obtained. The lossless hybrid encoding algorithm, which combines run-length encoder and Huffman encoder, has been used for compression and decompression purpose. This work is proposed to examine the efficiency of different wavelet types and to determine the best. The objective of this research work is to improve compression ratio and compression gain.

Keywords: DICOM, discrete wavelet, N-level decomposition, threshold approach, data hiding algorithms

#### 1. Introduction

Digital technology has, in the last few decades, entered in almost every aspect of medicine. There has been a huge development in noninvasive medical imaging equipment. Since there are multiple medical equipment manufacturers, there is a strong need to develop a standard for storage and exchange of medical images. DICOM (Digital Imaging and Communications in Medicine) makes medical image exchange easier and independent of the imaging equipment manufacturer. The DICOM standard has been developed by ACR-NEMA to meet the needs of manufacturers and users of medical imaging equipment for interconnection of devices on standard networks. The DICOM technology is suitable when sending images between different departments within hospitals and/or other hospitals and the consultant. DICOM file contains both a header, which include text information such as patient's name, modality, image size, etc., and image data in the same file. Hence DICOM standards are widely used in the integration of digital imaging systems in medicine. Figure 1 shows the structure of DICOM image file. It has two main

DICOM File Format ¼ Header Size þ Pixel Data Size (1)

Pixel Data Size ¼ Rows � Columns � Pixel Depth � Number of Frames (2)

The DICOM Image Compression and Patient Data Integration using Run Length and Huffman…

any personal computer without the need of specific viewers. File format are designed with the help of image conversion technique and coding schemes.

2. Related work

DOI: http://dx.doi.org/10.5772/intechopen.89143

Figure 2.

195

Basic digital classification image file format.

The more popular formats used in daily practice are the JPEG, JPEG 2000, TIFF, GIF, PNG, and BMP formats. The images saved in these formats can be accessed on

The rest of the chapter is organized as follows. Section 2 provides an overview of image file format and image standard used for medical image compression, and data hiding methods are described in the Section 3. Section 4 provides proposed work brief explanation. Section 5 discusses regarding results that are obtained after implementation of application. Finally, Section 6 concludes the chapter.

The related work is a comprehensive summery of previous research on image file format, standard image compression using transform coding, and patient information integration into image for DICOM images. The more popular formats used in daily practice are the JPEG, JPEG 2000, TIFF, GIF, PNG, and BMP formats. The images saved in these formats can be accessed on any personal computer without the need of specific viewers. File formats are designed with the help of image conversion technique and coding schemes [1–3]. Figure 2 shows the basic digital image file formats and its classification. The vector images are not commonly used in medical data processing. Table 1 gives the summary of various parameters of raster image file format [4, 5, 7–11]. Table 2 gives a characteristic overview of the major file formats currently used

Two types of compression methods are classified. The lossless image has huge application in archival of medical and digital radiography document, where loss of information in original image could consider improper diagnosis. The medical imaging application required lossless image compression. Thus, medical image compression application development is a challenging problem. The survey paper [14] conveys that compression ratio 4:1 is possible using lossless compression. An increasing volume of data generated by new imaging modality, CT scan and MRI lossy compression technique are used to decrease the cost of storage and increase the efficiency of transmission over networks for teleradiology application [12]. There are two main categories of compression lossless (reversible) and lossy (irreversible). DICOM support lossless compression schemes like run-length encoding, Huffman coding, LZW coding, area coding, and arithmetic coding. The RLE is used

in medical imaging, i.e., NIfTI, Analyze, DICOM, and MINC [6, 12, 13].

#### Figure 1.

The structure of a DICOM image file.

components. The first is header; it consists of 128 bytes of file preamble which is followed by string by 4-byte prefix, and it contains four-character string. The second is data set; it consists of multiple set of data elements. Each data element has four fields; these are tag, value representation, value length, and value field. The third is image pixel intensity data; it contains necessary medical image data display like number of frames, lines, columns, etc.

#### 1.1 File format used by DICOM images

There are four major file formats in medical imaging, and they are Neuroimaging Informatics Technology Initiative (NIfTI), Analyze, DICOM, and MINC. The task of the image file format is to provide a standardized way to store the unique data in a much organized and systematic manner and showcase how the pixel data understood the correct loading, visualization, and analysis was derived by the software. The major file format currently useful in medical imaging is DICOM format. The DICOM format includes some information that can be useful for image registration, such as position and orientation of the image with respect to the data acquisition device and patient information with respect to voxel size. DICOM file format design consideration is based on the following concept such as pixel depth, photometric interpretation, metadata, and pixel data. The DICOM file format is created by addition of header size and pixel data. Mathematical equations are as follows:

The DICOM Image Compression and Patient Data Integration using Run Length and Huffman… DOI: http://dx.doi.org/10.5772/intechopen.89143

$$\text{DICOM Filter} = \text{Header Size} + \text{Pixel Data Size} \tag{1}$$

### Pixel Data Size ¼ Rows � Columns � Pixel Depth � Number of Frames (2)

The more popular formats used in daily practice are the JPEG, JPEG 2000, TIFF, GIF, PNG, and BMP formats. The images saved in these formats can be accessed on any personal computer without the need of specific viewers. File format are designed with the help of image conversion technique and coding schemes.

The rest of the chapter is organized as follows. Section 2 provides an overview of image file format and image standard used for medical image compression, and data hiding methods are described in the Section 3. Section 4 provides proposed work brief explanation. Section 5 discusses regarding results that are obtained after implementation of application. Finally, Section 6 concludes the chapter.

#### 2. Related work

The related work is a comprehensive summery of previous research on image file format, standard image compression using transform coding, and patient information integration into image for DICOM images. The more popular formats used in daily practice are the JPEG, JPEG 2000, TIFF, GIF, PNG, and BMP formats. The images saved in these formats can be accessed on any personal computer without the need of specific viewers. File formats are designed with the help of image conversion technique and coding schemes [1–3]. Figure 2 shows the basic digital image file formats and its classification. The vector images are not commonly used in medical data processing.

Table 1 gives the summary of various parameters of raster image file format [4, 5, 7–11]. Table 2 gives a characteristic overview of the major file formats currently used in medical imaging, i.e., NIfTI, Analyze, DICOM, and MINC [6, 12, 13].

Two types of compression methods are classified. The lossless image has huge application in archival of medical and digital radiography document, where loss of information in original image could consider improper diagnosis. The medical imaging application required lossless image compression. Thus, medical image compression application development is a challenging problem. The survey paper [14] conveys that compression ratio 4:1 is possible using lossless compression. An increasing volume of data generated by new imaging modality, CT scan and MRI lossy compression technique are used to decrease the cost of storage and increase the efficiency of transmission over networks for teleradiology application [12]. There are two main categories of compression lossless (reversible) and lossy (irreversible). DICOM support lossless compression schemes like run-length encoding, Huffman coding, LZW coding, area coding, and arithmetic coding. The RLE is used

Figure 2. Basic digital classification image file format.

components. The first is header; it consists of 128 bytes of file preamble which is followed by string by 4-byte prefix, and it contains four-character string. The second is data set; it consists of multiple set of data elements. Each data element has four fields; these are tag, value representation, value length, and value field. The third is image pixel intensity data; it contains necessary medical image data display

There are four major file formats in medical imaging, and they are Neuroimaging Informatics Technology Initiative (NIfTI), Analyze, DICOM, and MINC. The task of the image file format is to provide a standardized way to store the unique data in a much organized and systematic manner and showcase how the pixel data understood the correct loading, visualization, and analysis was derived by the software. The major file format currently useful in medical imaging is DICOM format. The DICOM format includes some information that can be useful for image registration, such as position and orientation of the image with respect to the data acquisition device and patient information with respect to voxel size. DICOM file format design consideration is based on the following concept such as pixel depth, photometric interpretation, metadata, and pixel data. The DICOM file format is created by addition of

header size and pixel data. Mathematical equations are as follows:

like number of frames, lines, columns, etc.

Figure 1.

Coding Theory

194

The structure of a DICOM image file.

1.1 File format used by DICOM images


appends the nonzero coefficients represented as "LEVEL" [15, 16]. The Huffman coding is a lossless compression technique, which is used for medical image compression. This works on variable length encoding principal, which includes calculation of length of unique codes. It generates a binary tree, which is also known as Huffman tree. Huffman algorithm gives higher compression ratio in the case of medical image compression. During the whole process of compression, there should

DOI: http://dx.doi.org/10.5772/intechopen.89143

The DICOM Image Compression and Patient Data Integration using Run Length and Huffman…

not be any loss of information that will affect proper diagnosis [17, 18]. The Huffman code is designed to integrate the lowest probable symbols, and this integration is repeated until only two probabilities of two symbols are left. In this survey paper, certain improvements are discussed on the existing Huffman technique which will help to preserve any loss of information during compression that will affect proper diagnosis [19]. In lossy compression method, data are rejected during compression and cannot be recovered completely. This method reaches much greater compression performance than lossless compression. Wavelet and higher-level JPEG are the example of lossy compression technique where JPEG 2000 is a progressive lossless-to-lossy compression algorithm [20–22]. This article uses the concept of data hiding into image for data encryption. In order to enable large capacity of data hiding and maintaining good image quality, the data integration is applied on detail coefficients of high-frequency sub-bands. It works on transform domain of multilevel two-dimensional discrete wavelet transform. The objective of this implementation is to perform image compression as much as possible. It will help to reduce the redundancy of the image and to store or transmit data in an efficient form. As in telemedicine, the medical images are transmitted through advanced hyperlinks; medical image compression without any loss of useful information is of immense importance for the fast transfer of the medical data [23].

This proposed compression approach deals with .dcm file of DICOM format. It splits .dcm file into patient data with bmp.txt extension and gray scale image with . bmp extension. Then N-level DWT using various wavelet types is applied to a gray image. Firstly, this splits the image into n number of high-frequency sub-bands (HLn, LHn, HHn) where n = 1, 2, 3 … ..N and one low-frequency sub-band (LLn) where n = maximum level (N). The high-frequency sub-bands at levels 1, 2, 3, and 4 are threshold and quantized and find detail coefficients are encoded directly through run-length encoding. Secondly, the one low-frequency sub-band is also threshold and quantized and find high-level approximate coefficient. Lastly, both the coefficients (detail coefficients, high-level approximate coefficient) are encoded

In a gray scale image, each pixel is represented by 8-bit unsigned integer value.

represented black and 255 represent white. In text file every text file is represented by ASCII value. The ASCII value is run between 0 and 128. The extended ASCII value is 8 bit, and it matched with 8-bit pixel intensity. So, both the entities are treated as normal integer. In this proposed system, we have practiced bitwise XOR approach for text integration into image. The proposed work deals ASCII conversion of patient data and multilevel two-dimensional discrete wavelet transform. Advantages of this work are high data integrity even with large patient data. Accepted levels of imperceptibility, excellent PSNR values, and high CR and good payload capacity are obtained. Figures 3 and 4 represented the block diagram of compression and decompression with data integration scheme. The decompression process is the inverse process of DICOM image compression as shown in block

The minimum and maximum value of unsigned integer is 0 to 255. The 0

3. Proposed work

by Huffman coding.

197

#### Table 1.

Summary of raster (bitmap) image file format.


#### Table 2.

Characteristics of medical image file format.

for medical image compression in hybrid approach, where gray scale value gives certain interesting fact about the distribution in image. The background pixels of all the medical image are low values, and they differ by +3 or 3. The RLE is based on dynamic array implementation, no need to process whole image [8]. The paper served that the number of repeated zero count which is represented as "RUN" and

The DICOM Image Compression and Patient Data Integration using Run Length and Huffman… DOI: http://dx.doi.org/10.5772/intechopen.89143

appends the nonzero coefficients represented as "LEVEL" [15, 16]. The Huffman coding is a lossless compression technique, which is used for medical image compression. This works on variable length encoding principal, which includes calculation of length of unique codes. It generates a binary tree, which is also known as Huffman tree. Huffman algorithm gives higher compression ratio in the case of medical image compression. During the whole process of compression, there should not be any loss of information that will affect proper diagnosis [17, 18]. The Huffman code is designed to integrate the lowest probable symbols, and this integration is repeated until only two probabilities of two symbols are left. In this survey paper, certain improvements are discussed on the existing Huffman technique which will help to preserve any loss of information during compression that will affect proper diagnosis [19]. In lossy compression method, data are rejected during compression and cannot be recovered completely. This method reaches much greater compression performance than lossless compression. Wavelet and higher-level JPEG are the example of lossy compression technique where JPEG 2000 is a progressive lossless-to-lossy compression algorithm [20–22]. This article uses the concept of data hiding into image for data encryption. In order to enable large capacity of data hiding and maintaining good image quality, the data integration is applied on detail coefficients of high-frequency sub-bands. It works on transform domain of multilevel two-dimensional discrete wavelet transform. The objective of this implementation is to perform image compression as much as possible. It will help to reduce the redundancy of the image and to store or transmit data in an efficient form. As in telemedicine, the medical images are transmitted through advanced hyperlinks; medical image compression without any loss of useful information is of immense importance for the fast transfer of the medical data [23].

#### 3. Proposed work

This proposed compression approach deals with .dcm file of DICOM format. It splits .dcm file into patient data with bmp.txt extension and gray scale image with . bmp extension. Then N-level DWT using various wavelet types is applied to a gray image. Firstly, this splits the image into n number of high-frequency sub-bands (HLn, LHn, HHn) where n = 1, 2, 3 … ..N and one low-frequency sub-band (LLn) where n = maximum level (N). The high-frequency sub-bands at levels 1, 2, 3, and 4 are threshold and quantized and find detail coefficients are encoded directly through run-length encoding. Secondly, the one low-frequency sub-band is also threshold and quantized and find high-level approximate coefficient. Lastly, both the coefficients (detail coefficients, high-level approximate coefficient) are encoded by Huffman coding.

In a gray scale image, each pixel is represented by 8-bit unsigned integer value. The minimum and maximum value of unsigned integer is 0 to 255. The 0 represented black and 255 represent white. In text file every text file is represented by ASCII value. The ASCII value is run between 0 and 128. The extended ASCII value is 8 bit, and it matched with 8-bit pixel intensity. So, both the entities are treated as normal integer. In this proposed system, we have practiced bitwise XOR approach for text integration into image. The proposed work deals ASCII conversion of patient data and multilevel two-dimensional discrete wavelet transform. Advantages of this work are high data integrity even with large patient data. Accepted levels of imperceptibility, excellent PSNR values, and high CR and good payload capacity are obtained. Figures 3 and 4 represented the block diagram of compression and decompression with data integration scheme. The decompression process is the inverse process of DICOM image compression as shown in block

for medical image compression in hybrid approach, where gray scale value gives certain interesting fact about the distribution in image. The background pixels of all the medical image are low values, and they differ by +3 or 3. The RLE is based on dynamic array implementation, no need to process whole image [8]. The paper served that the number of repeated zero count which is represented as "RUN" and

Sr. no.

Coding Theory

File format

5 JPEG 2000

Table 1.

Sr. no.

File format Extension Bit depth Compression

Gray Color

16

and unsigned (8 bit, 16 bit, 32 bit)

and unsigned (8 bit to 64 bit)

and unsigned (8 bit to 32 bit)

Signed (8 bit to 32 bit) and unsigned (8 bit)

48

1, 4, 8, 24, 48

48

Integer Float Complex

Not supported

Signed and unsigned (32 bit to 128 bit)

Signed and unsigned (32 bit to 64 bit)

Signed (32 bit to 64 bit)

2 DICOM dcm 8, 16 8, 24,

6 PNG png 1, 4, 8,

7 TIFF tiff 8, 16 8, 24,

Summary of raster (bitmap) image file format.

1 DICOM .dcm Signed

2 NIfTI .nii Signed

3 MINC .mnc Signed

img

Characteristics of medical image file format.

4 Analyze .hdr and .

Table 2.

196

type

1 BMP bmp 1,4,8 1,4,8,24 Lossless Very low IrfanView, XnView,

3 GIF gif 1, 4, 8 1, 4, 8 Lossless Medium IrfanView, XnView,

4 JPEG jpg 8 24 Lossy Average IrfanView, XnView,

Jp2 8, 16 24, 48 Lossless High IrfanView, XnView

Lossless or lossy

Extension Data type Header Compression scheme

Not supported

Signed and unsigned (64 bit to 256 bit)

Signed and unsigned (32 bit to 64 bit)

Signed (64 bit) Variable length binary format

Fixed length (532 byte binary format)

Extended binary format

Fixed length (348 byte binary format)

Compression performance

Lossless Low IrfanView, XnView,

Lossless High IrfanView, XnView

Very high IrfanView, XnView

supported

JPEG, RLE, JPEG-LS, MPEG2/MPEG4, JPEG XR

gzip (it is a software application used to store compressed and decompressed file)

gzip (it is a software application used to store compressed and decompressed file)

High dynamic range (HDR) imaging uses subband coding technique which is an example of lossy technique

Name of supported free image viewer

Osiris, ImageJ

Osiris, ImageJ

Osiris, ImageJ

Osiris, ImageJ

6. Increment Pixel value until last value.

DOI: http://dx.doi.org/10.5772/intechopen.89143

9.End Loop

in the image)

Pixel)

9.End Loop

been modified.

199

text in to Image Pixel)

7. Increment Character value until last value

3.Convert pixel value into 16-bit integer value

8.Check the Character Count in text file = Character Count +1

3.2 Extraction of patient information (text) from the image file

10.Rest of Integration of text in Image Pixel = original pixels of image

The DICOM Image Compression and Patient Data Integration using Run Length and Huffman…

1.Open the original image and Integration of text in to Image Pixel

6.if A = 0 then break, else extracted text file which is equal to A

7.Extract original Pixel = Next original Pixel in image

3.3 Multilevel 2D DWT decomposition on wavelet types

2.Execute while loop (Number of Pixel Count in image file ≤ Number of pixels

4. Integration of text in to Image Pixel = convert to 16-bit integer (Integration of

5.A = (Original value of Pixel) XORing\_Bitwise (Integration of text in to Image

8. Integration of text in to Image Pixel = Next Integration of text in to Image Pixel

We implement an N-level 2D DWT decomposition. At each level of decomposition, the LL sub-band from the previous level is obtained, and each previous level is replaced with four new sub-bands. Each new sub-band is half the width and half the height of the LL sub-band from its parent sub-bands. The formula to calculate the total number of sub-bands depends on the number of level n. The number of sub-bands is therefore 3n + 1, where HHn represent high-frequency band, LLn is low-frequency band, and LHn and HLn are middle-frequency bands. The coefficients in LL are dominant. If any of the coefficients in LLn frequency band are changed, observer can observe that the corresponding spatial domain image has

Figure 5 shows the process of character integration in LHn sub-band, and it generates wavelet coefficients. It shows the scale and orientation selectivity of the DWT. Greatest energy is contained in the LLn sub-band, and the least energy is in the HHn sub-band. The HLn sub-band contains the vertical edges, and the LHn subband contains the horizontal edges. In this proposed work, we focus on data integration in LHn sub-bands because this band has high energy distribution as com-

pared to other bands like HLn and HHn. The finest wavelet type and the appropriate coefficient selection method using threshold and quantization.

Figure 3. Block diagram of DICOM image compression and data integration method.

#### Figure 4.

Block diagram of DICOM image decompression and data and image extraction.

diagram in Figure 4. The DICOM compressed image is recreated with acceptable quality through the abovementioned process.

#### 3.1 Algorithm for integration of patient information (text) into image file


The DICOM Image Compression and Patient Data Integration using Run Length and Huffman… DOI: http://dx.doi.org/10.5772/intechopen.89143


#### 3.2 Extraction of patient information (text) from the image file


diagram in Figure 4. The DICOM compressed image is recreated with acceptable

3.1 Algorithm for integration of patient information (text) into image file

3.Execute while loop (Number of Character Count in text file<= Number of

4.Convert Character vector and pixel value into an unsigned 16-bit integer

XORing\_Bitwise (Converted 16-bit unsigned character value)

5. Integration of text in to Image Pixel = (Converted 16-bit unsigned pixel value)

quality through the abovementioned process.

Block diagram of DICOM image compression and data integration method.

Block diagram of DICOM image decompression and data and image extraction.

1.Select a proper Greyscale BMP image.

using function str2num and unit 16.

pixels in the image)

Figure 3.

Coding Theory

Figure 4.

198

2. Select patient text file with .txt extension

#### 3.3 Multilevel 2D DWT decomposition on wavelet types

We implement an N-level 2D DWT decomposition. At each level of decomposition, the LL sub-band from the previous level is obtained, and each previous level is replaced with four new sub-bands. Each new sub-band is half the width and half the height of the LL sub-band from its parent sub-bands. The formula to calculate the total number of sub-bands depends on the number of level n. The number of sub-bands is therefore 3n + 1, where HHn represent high-frequency band, LLn is low-frequency band, and LHn and HLn are middle-frequency bands. The coefficients in LL are dominant. If any of the coefficients in LLn frequency band are changed, observer can observe that the corresponding spatial domain image has been modified.

Figure 5 shows the process of character integration in LHn sub-band, and it generates wavelet coefficients. It shows the scale and orientation selectivity of the DWT. Greatest energy is contained in the LLn sub-band, and the least energy is in the HHn sub-band. The HLn sub-band contains the vertical edges, and the LHn subband contains the horizontal edges. In this proposed work, we focus on data integration in LHn sub-bands because this band has high energy distribution as compared to other bands like HLn and HHn. The finest wavelet type and the appropriate coefficient selection method using threshold and quantization.

Figure 5.

The process of character integration in LHn sub-bands and generation of wavelet coefficients.

#### 3.4 Apply thresholding and quantization technique on generated coefficients

#### 3.4.1 Thresholding

Let n � n be matrix of an original image; noise observation can be written as s ¼ x þ n, where s = noise observation, x = original image, and n = noise. Let s (i), x(i), and n(i) denote ith sample of pixels. By applying discrete wavelet transform, the observed noised image obtained wavelet coefficients y = θ + z, where y = Ws, θ = Wx, and z = Wn, respectively. To recover θ and y, y is transformed into wavelet domain that decomposes y into many sub-bands [19, 20]. Then the coefficients with small value in the sub-bands are dominated by noise, thus replacing noise coefficients by zero. It is denoted by

$$y(i) = \theta(i) + z(i) \tag{3}$$

<sup>λ</sup> <sup>¼</sup> <sup>σ</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffi

The DICOM Image Compression and Patient Data Integration using Run Length and Huffman…

The quantization of each level permits to collect the set of nearest values. The

uniform quantization on thresholded DWT coefficients in sub-bands will be transformed and to be contained in the interval width for quantization is between 0 to 2<sup>Q</sup> . The quantized matrix will be computed as follows: to choose the quantization value Q (Q represents the interval width for quantization of the DWT coefficients in sub-bands), further determine the max(y(i)) and min(y(i)) values of the DWT coefficients which will represent as DWTmax and DWTmin. The uniform quantization on the resulting DWT coefficient sub-bands is formulated by the

DWTCQ <sup>¼</sup> roundð �<sup>1</sup> <sup>þ</sup> <sup>2</sup><sup>Q</sup> � � <sup>∗</sup> y i <sup>d</sup>ðÞ� DWTmin

3.5 Encoding of wavelet coefficients using run-length encoder

%ofIDWTCQ <sup>¼</sup> round DWTmax ð Þ ð Þ � DWTmin <sup>=</sup>round �<sup>1</sup> <sup>þ</sup> <sup>2</sup><sup>Q</sup> � � <sup>∗</sup> DWTCQ <sup>þ</sup> DWTmin<sup>Þ</sup> �

In this implementation, hierarchical relationship of wavelet structure is explored to arrange wavelet coefficients into odd rows and even rows. The wavelet coefficients odd rows contain an ordering of wavelet coefficient that acts as approximate (smooth) value, and even rows contain different sign data that act as detail values. After evaluated many zeros in different orders of wavelets, by applying hard thresholding condition Eqs. (4), (5) on wavelet coefficients. In this whole process, separated approximate coefficients contain best information, while detail coefficients contain information like shapes and edges of image. The threshold condition chooses fixed threshold value to obtain desired quality of reconstructed image. After classifying threshold coefficients, need to transmit those coefficients using lossless method which will further be used for decompression purpose. Now encoded detail coefficients with run-length encoding excluding the highest approximate coefficient LL3 sub-bands, because LL3 sub-band does not have much long run of zeros. To convert repetitive data into bit stream, Huffman encoder has been used in this implementation. Huffman code is an example of optimum prefix code; these codes are generated using variable code length where a number of bits are essential. This will be helpful in average code length calculations, and thus the data compression is taking place where sometimes compressed image is smaller than original image.

The performances of implemented method are based on few essential criteria: the obtained compression ratio (CR), compression gain, and the quality of the

where Ly is the number of pixels in the medical image and σ is the noise

variance.

following equation:

and Huffman encoder

4. Results and discussion

201

3.4.2 DWT coefficient quantization

DOI: http://dx.doi.org/10.5772/intechopen.89143

2logLy p (7)

DWTmin ¼ minð Þ y ið Þ (8) DWTmax ¼ max y i ð Þ ð Þ (9)

DWTmax � DWTmin (10)

(11)

If

$$\widehat{\boldsymbol{y}(i)} = \text{abs}[\boldsymbol{y}(i)] < \lambda \tag{4}$$

$$\mathbf{y} \ (\mathbf{i}) = \mathbf{0} \tag{5}$$

where y (i) is the input and noise wavelet coefficients, λ is the Threshold Valu, y i dð Þ is the Threshold output.

We define the PCDZ parameter; this parameter is required to calculate the percentage of nonzero DWT coefficients.

$$PCDZ = 100 \ast \frac{\text{NBz}}{\text{Ly}} \tag{6}$$

where NBz = number of zeros in DWT coefficients.

Ly = Number of Coefficients in DWT.

The proposed method used in the global threshold value that is derived by Donoho [19, 20] is given by the equation below. It is known to have a universal threshold.

The DICOM Image Compression and Patient Data Integration using Run Length and Huffman… DOI: http://dx.doi.org/10.5772/intechopen.89143

$$
\lambda = \sigma \sqrt{2 \overline{\log Ly}} \tag{7}
$$

where Ly is the number of pixels in the medical image and σ is the noise variance.

#### 3.4.2 DWT coefficient quantization

The quantization of each level permits to collect the set of nearest values. The uniform quantization on thresholded DWT coefficients in sub-bands will be transformed and to be contained in the interval width for quantization is between 0 to 2<sup>Q</sup> . The quantized matrix will be computed as follows: to choose the quantization value Q (Q represents the interval width for quantization of the DWT coefficients in sub-bands), further determine the max(y(i)) and min(y(i)) values of the DWT coefficients which will represent as DWTmax and DWTmin. The uniform quantization on the resulting DWT coefficient sub-bands is formulated by the following equation:

$$DWTmin = \min(y(i))\tag{8}$$

$$DWTmax = max(\mathcal{y}(i))\tag{9}$$

$$DWTCQ = round((-1 + 2^Q) \* \frac{\bar{\jmath}(\bar{i}) - DWTmin}{DWTmax - DWTmin} \tag{10}$$

%ofIDWTCQ <sup>¼</sup> round DWTmax ð Þ ð Þ � DWTmin <sup>=</sup>round �<sup>1</sup> <sup>þ</sup> <sup>2</sup><sup>Q</sup> � � <sup>∗</sup> DWTCQ <sup>þ</sup> DWTmin<sup>Þ</sup> � (11)

#### 3.5 Encoding of wavelet coefficients using run-length encoder and Huffman encoder

In this implementation, hierarchical relationship of wavelet structure is explored to arrange wavelet coefficients into odd rows and even rows. The wavelet coefficients odd rows contain an ordering of wavelet coefficient that acts as approximate (smooth) value, and even rows contain different sign data that act as detail values. After evaluated many zeros in different orders of wavelets, by applying hard thresholding condition Eqs. (4), (5) on wavelet coefficients. In this whole process, separated approximate coefficients contain best information, while detail coefficients contain information like shapes and edges of image. The threshold condition chooses fixed threshold value to obtain desired quality of reconstructed image. After classifying threshold coefficients, need to transmit those coefficients using lossless method which will further be used for decompression purpose. Now encoded detail coefficients with run-length encoding excluding the highest approximate coefficient LL3 sub-bands, because LL3 sub-band does not have much long run of zeros. To convert repetitive data into bit stream, Huffman encoder has been used in this implementation. Huffman code is an example of optimum prefix code; these codes are generated using variable code length where a number of bits are essential. This will be helpful in average code length calculations, and thus the data compression is taking place where sometimes compressed image is smaller than original image.

#### 4. Results and discussion

The performances of implemented method are based on few essential criteria: the obtained compression ratio (CR), compression gain, and the quality of the

3.4 Apply thresholding and quantization technique on generated coefficients

The process of character integration in LHn sub-bands and generation of wavelet coefficients.

Let n � n be matrix of an original image; noise observation can be written as s ¼ x þ n, where s = noise observation, x = original image, and n = noise. Let s (i), x(i), and n(i) denote ith sample of pixels. By applying discrete wavelet transform, the observed noised image obtained wavelet coefficients y = θ + z, where y = Ws, θ = Wx, and z = Wn, respectively. To recover θ and y, y is transformed into wavelet domain that decomposes y into many sub-bands [19, 20]. Then the coefficients with small value in the sub-bands are dominated by noise, thus replacing noise coeffi-

y i

where y (i) is the input and noise wavelet coefficients, λ is the Threshold Valu,

We define the PCDZ parameter; this parameter is required to calculate the

PCDZ <sup>¼</sup> <sup>100</sup> <sup>∗</sup> NBz

The proposed method used in the global threshold value that is derived by Donoho [19, 20] is given by the equation below. It is known to have a universal

y iðÞ¼ θðÞþi z ið Þ (3)

dðÞ¼ abs y i ½ � ð Þ <λ (4)

y iðÞ¼ 0 (5)

Ly (6)

3.4.1 Thresholding

Figure 5.

Coding Theory

If

y i

threshold.

200

cients by zero. It is denoted by

dð Þ is the Threshold output.

percentage of nonzero DWT coefficients.

Ly = Number of Coefficients in DWT.

where NBz = number of zeros in DWT coefficients.

reconstructed image using PSNR, MSE (mean squared error), and SNR. Data compression equations are given below.

Data compression ratio = Uncompressed size/Compressed size

$$CR = \frac{X}{Y} \tag{12}$$

Figure 6 shows that pop-up message is generated after compression of DICOM input image, and it displayed warning dialog that image is compressed successfully, and then after pressing the ok button, compressed image is stored in image folder

The DICOM Image Compression and Patient Data Integration using Run Length and Huffman…

Graphical user interface of DICOM image 1.2.840.113619.2.5.1762583153.215519.978957063.122.dcm of

Graphical user interface of DICOM image 1.2.840.113619.2.5.1762583153.215519.978957063.122.dcm of 522 KB for N = 4 has displayed PSNR, MSE, and SNR parameters after pressing pop-up button of decompression.

522 KB for N = 3 and after pressing pop-up button of image compression successfully with size.

with. Hdwt extension (Figure 7).

DOI: http://dx.doi.org/10.5772/intechopen.89143

Figure 6.

Figure 7.

203

Space saving (%) determines performance of transformation efficiency over storage of data bits for original bit size to unprocessed bit size. It is like compression ratio; however it reflects percentage of how much data space is saved following compression [21]. It is given by the following equation:

Percentage of Compression Gain = 1 – Compressed size/Uncompressed Size

$$\text{Compression Gain} = \left(1 - \frac{Y}{X}\right) \* 100\tag{13}$$

The calculated peak signal to noise ratio between maximum values is power of signal and power of distorting noise which affects the quality of its representation. The PSNR is generally expressed in terms of logarithmic decibel scale [22].

$$PSNR = 20\log\_{10}\left(\frac{MAX\_f}{\sqrt{MSE}}\right) \tag{14}$$

MAX <sup>f</sup> is the maximum signal value which exists in the original image, which is known to be good value of image?

where the MSE

$$MSE = \frac{1}{mn} \sum\_{0}^{m-1} \sum\_{0}^{n-1} \left\| f(i, j) - g(i, j) \right\|^2 \tag{15}$$

where f is the matrix data of our original image. g is the matrix data of our degraded image. m is the numbers of rows of pixels of the images. i is the index of that row. n is the number of columns of pixels of the image. j is the index of that column.

where SNR is given as the ratio of the mean value of the signal and the standard deviation of the noise.

$$\text{SNR} = 20 \ast \log \left( \frac{\text{Intensity Signal}}{\text{Intensity Noise}} \right) \tag{16}$$

To analyze the performance of our proposed method, we take Table 3 as an input MR image for evaluation of various parameters, and their information are as follows:


Table 3.

Input MR image for evaluation of various parameters.

The DICOM Image Compression and Patient Data Integration using Run Length and Huffman… DOI: http://dx.doi.org/10.5772/intechopen.89143

Figure 6 shows that pop-up message is generated after compression of DICOM input image, and it displayed warning dialog that image is compressed successfully, and then after pressing the ok button, compressed image is stored in image folder with. Hdwt extension (Figure 7).

#### Figure 6.

reconstructed image using PSNR, MSE (mean squared error), and SNR. Data com-

CR <sup>¼</sup> <sup>X</sup>

Space saving (%) determines performance of transformation efficiency over storage of data bits for original bit size to unprocessed bit size. It is like compression ratio; however it reflects percentage of how much data space is saved following

Percentage of Compression Gain = 1 – Compressed size/Uncompressed Size

The calculated peak signal to noise ratio between maximum values is power of signal and power of distorting noise which affects the quality of its representation.

MAX <sup>f</sup> is the maximum signal value which exists in the original image, which is

X � �

MAX <sup>f</sup> ffiffiffiffiffiffiffiffiffiffi MSE p � �

Compression Gain <sup>¼</sup> <sup>1</sup> � <sup>Y</sup>

The PSNR is generally expressed in terms of logarithmic decibel scale [22].

PSNR ¼ 20log10

Xm�1 0

Xn�1 0

where f is the matrix data of our original image. g is the matrix data of our degraded image. m is the numbers of rows of pixels of the images. i is the index of that row. n is the number of columns of pixels of the image. j is the index of that column. where SNR is given as the ratio of the mean value of the signal and the standard

SNR <sup>¼</sup> <sup>20</sup> <sup>∗</sup> log Intensity Signal

To analyze the performance of our proposed method, we take Table 3 as an input MR image for evaluation of various parameters, and their information are as

> Level of decomposition (N)

Intensity Noise � �

> Noise variance (σ<sup>2</sup><sup>Þ</sup>

522 3,4 σ = 1 266,539 3.29416477

The size of the DWT coefficient arrays (Ly)

<sup>Y</sup> (12)

k k f ið Þ� , <sup>j</sup> g ið Þ , <sup>j</sup> <sup>2</sup> (15)

∗ 100 (13)

(14)

(16)

Threshold value (λ Þ

Data compression ratio = Uncompressed size/Compressed size

compression [21]. It is given by the following equation:

MSE <sup>¼</sup> <sup>1</sup>

mn

size (in KB)

pression equations are given below.

Coding Theory

known to be good value of image?

Image name Input

Input MR image for evaluation of various parameters.

1.2.840.113619.2.5.1762583153. 215519.978957063.122.dcm

where the MSE

deviation of the noise.

follows:

Table 3.

202

Graphical user interface of DICOM image 1.2.840.113619.2.5.1762583153.215519.978957063.122.dcm of 522 KB for N = 3 and after pressing pop-up button of image compression successfully with size.

#### Figure 7.

Graphical user interface of DICOM image 1.2.840.113619.2.5.1762583153.215519.978957063.122.dcm of 522 KB for N = 4 has displayed PSNR, MSE, and SNR parameters after pressing pop-up button of decompression.

DWT gives higher compression ratio, and Discrete Meyer DWT and Reverse Biorthogonal DWT give lower compression size. N = 4 gives better compression ratio (15:1 for Biorthogonal DWT and 10:1 Daubechies DWT) than N = 3.

The DICOM Image Compression and Patient Data Integration using Run Length and Huffman…

DWT) than N = 3 (Figure 11).

DOI: http://dx.doi.org/10.5772/intechopen.89143

Figure 10.

Figure 11.

205

for Biorthogonal DWT.

and 4.

In Figure 10, figure graph shows comparison of compression gain when decomposition level is 3 or 4 for various wavelet types where size of input MRI is 522 KB. Biorthogonal DWT gives higher compression gain, and Discrete Meyer DWT and Reverse Biorthogonal DWT give lower compression gain. N = 4 gives better compression gain (93.6781% for Biorthogonal DWT and 90.6130% for Daubechies

Table 4 shows comparison of image quality parameters when application run at decomposition levels 3 and 4 for various wavelet types where size of input MRI is

(35.2626db) and SNR is (28.3993 db). So, picture quality is good for N = 3. When we

The plot for compression gain vs wavelet types where image size of input (MRI) image size 522 KB for N = 3

The 36 KB of input .dcm file, compressed HDWT file, text file, and patient data integrated into the image file

522 KB. Daubechies DWT gives higher PSNR (42.0998db) where MSE is

consider N = 4 decomposition level, we can choose Daubechies DWT for

#### Figure 8.

The plot for size of compressed image in KB vs wavelet types for N = 3 and 4.

Figure 8 shows that if input image size is 522 KB, then Biorthogonal DWT give highest compressed size, and Reverse Biorthogonal DWT gives lowest compressed size. The decomposition level = 4 gives better result than N = 3.

Figure 9 plot shows comparison of compression ratio when decomposition level is 3 or 4 for various wavelet types where size of input MRI is 522 KB. Biorthogonal

#### Figure 9.

The plot for compression ratio vs wavelet types For image size of input(MRI) image size 522 KB, when N = 3 and 4 compare.

The DICOM Image Compression and Patient Data Integration using Run Length and Huffman… DOI: http://dx.doi.org/10.5772/intechopen.89143

DWT gives higher compression ratio, and Discrete Meyer DWT and Reverse Biorthogonal DWT give lower compression size. N = 4 gives better compression ratio (15:1 for Biorthogonal DWT and 10:1 Daubechies DWT) than N = 3.

In Figure 10, figure graph shows comparison of compression gain when decomposition level is 3 or 4 for various wavelet types where size of input MRI is 522 KB. Biorthogonal DWT gives higher compression gain, and Discrete Meyer DWT and Reverse Biorthogonal DWT give lower compression gain. N = 4 gives better compression gain (93.6781% for Biorthogonal DWT and 90.6130% for Daubechies DWT) than N = 3 (Figure 11).

Table 4 shows comparison of image quality parameters when application run at decomposition levels 3 and 4 for various wavelet types where size of input MRI is 522 KB. Daubechies DWT gives higher PSNR (42.0998db) where MSE is (35.2626db) and SNR is (28.3993 db). So, picture quality is good for N = 3. When we consider N = 4 decomposition level, we can choose Daubechies DWT for

#### Figure 10.

Figure 8 shows that if input image size is 522 KB, then Biorthogonal DWT give highest compressed size, and Reverse Biorthogonal DWT gives lowest compressed

Figure 9 plot shows comparison of compression ratio when decomposition level is 3 or 4 for various wavelet types where size of input MRI is 522 KB. Biorthogonal

The plot for compression ratio vs wavelet types For image size of input(MRI) image size 522 KB, when N = 3 and 4

size. The decomposition level = 4 gives better result than N = 3.

The plot for size of compressed image in KB vs wavelet types for N = 3 and 4.

Figure 8.

Coding Theory

Figure 9.

compare.

204

The plot for compression gain vs wavelet types where image size of input (MRI) image size 522 KB for N = 3 and 4.


#### Figure 11.

The 36 KB of input .dcm file, compressed HDWT file, text file, and patient data integrated into the image file for Biorthogonal DWT.


Table 4.

Input MRI .dcm file input having 522 KB size for evaluation of various image quality parameters.

compression to achieve higher quality of medical image. This tool does not work on Haar DWT; there is no output for Haar wavelet type.

Table 5 indicates that Daubechies, Symlets, and Coiflets DWT give higher compressed image size (9 KB) and highest compression ratio (4) and gain (75%). Decompression is working in this input size (36 KB). So, when the small-size input . dcm file, image reconstruction is required, this implementation work will be applicable. This application gives good PSNR (45.1486) and better MSE (29.88) and SNR (30.0124) for Daubechies DWT.

Table 6 indicates that Biorthogonal DWT gives higher compressed image size (2 KB) and highest compression ratio (18:1) for threshold value = 50. But image quality is degraded due to PSNR value = 32.7496.

Table 7 shows that CT0081 gives good compression result for implemented method, but other input CT images give less compression ratio than .jpg file format. The obtained result shows that.

5. Conclusions

Table 7.

Table 6.

Sr. no. Image ID

Acknowledgements

207

Compression and decompression are necessary tasks in medical imaging applications. This implementation provides patient data integration within the medical image. It is very important to maintain the patient data security. This implementation work is very helpful to hide and recover patient information within the medical image and follow compression/decompression without any data loss. In this implanted work, 2D DWT and N-level decomposition are applied on medical image, and then the extracted detail coefficients are firstly encoded by RLE. Secondly, the extracted approximate coefficient and encoded detailed coefficients are encoded by Huffman encoder. The generated result shown that Biorthogonal DWT gives better compression size, compression ratio, and compression gain for higher decomposition level (N = 4), but image quality parameters like PSNR and MSE are degraded. After comparison with JPEG file format and implemented work, this work gives less compression size. We conclude that for medical image compression, we can select N = 3, decomposition level with λ =

Parameters λ =3 λ =10 λ = 20 λ = 30 λ = 50 Compressed image in KB 9 5 4 3 2 Compression ratio 4 7.2 9 12 18 PSNR 35.1707 34.5081 33.673 33.1783 32.7496 MSE 19.9022 19.2396 18.4 17.9098 17.9098 SNR 39.0124 39.0124 39.0124 39.0124 39.0124

The DICOM Image Compression and Patient Data Integration using Run Length and Huffman…

Compression and image quality performance of input (CT scan) image size 36 KB (gray scale) for different

DICOM size in KB

.jpg file size in KB Implemented method .hdwt size in KB

Input file size in KB

1 CT0014 512 512 24 1030 769 25 29 2 CT0051 512 200 24 204 301 17 22 3 CT0052 250 512 24 254 376 17 26 4 CT0059 350 512 24 353 526 23 46 5 CT0074 512 512 24 5130 769 61 67 6 CT0081 888 733 24 2547 1908 167 165 7 CT0090 512 512 24 3591 769 93 152 8 CT0101 512 512 24 516 769 92 71 9 CT102 512 605 24 609 909 153 162 10 CT110 512 512 24 4616 769 61 68

threshold values on Biorthogonal DWT using proposed method.

DOI: http://dx.doi.org/10.5772/intechopen.89143

Dimension Depth

in bit

Authors thank Dr. S.V. Dudal, HOD, Department of Applied Electronics, SGBA

University, Amravati, India, for providing all kind of facilities and support.

3, threshold value, and Biorthogonal DWT for good image quality.

Comparison of implemented method and .JPG format for 10 CT scan DICOM images.


Table 5.

Compression and image quality performance of input (CT scan) image size 36 KB for different wavelet types.

The DICOM Image Compression and Patient Data Integration using Run Length and Huffman… DOI: http://dx.doi.org/10.5772/intechopen.89143


Table 6.

Compression and image quality performance of input (CT scan) image size 36 KB (gray scale) for different threshold values on Biorthogonal DWT using proposed method.


#### Table 7.

compression to achieve higher quality of medical image. This tool does not work on

Wavelet types PSNR MSE SNR PSNR MSE SNR Biorthogonal DWT 35.3358 35.2626 21.6352 30.7777 35.3626 17.0771

Daubechies DWT 42.0998 35.2626 28.3993 39.6729 35.3626 25.9723 Symlets DWT 39.0696 35.3626 25.3691 36.6956 35.3626 22.995 Coiflets DWT 40.4769 35.3626 26.7763 36.1026 35.3626 22.402 Reverse Biorthogonal DWT 32.0256 35.3626 18.325 26.3603 35.3626 12.6597 Discrete Meyer DWT 40.6572 35.3626 26.9566 32.5428 35.3626 18.8423

Image quality parameter for N = 3 (in decibels)

Image quality parameter for N = 4 (in decibels)

Table 6 indicates that Biorthogonal DWT gives higher compressed image size (2 KB) and highest compression ratio (18:1) for threshold value = 50. But image

Table 7 shows that CT0081 gives good compression result for implemented method, but other input CT images give less compression ratio than .jpg file format.

2 Haar DWT 11 3.272727 69.44444 44.8457 29.5771 30.0124 3 Daubechies DWT 9 4 75 45.1486 29.88 30.0124 4 Symlets DWT 9 4 75 44.9406 29.6721 39.0124 5 Coiflets DWT 9 4 75 38.9493 23.6808 39.0124

Compression and image quality performance of input (CT scan) image size 36 KB for different wavelet types.

CR % of CG PSNR

15 2.4 58.33333 35.2407 19.9722 39.0124

18 2 50 35.7811 20.5126 39.0124

15 2.4 58.33333 44.6503 29.3818 39.0124

(in db)

MSE (in db)

SNR (in db)

Table 5 indicates that Daubechies, Symlets, and Coiflets DWT give higher compressed image size (9 KB) and highest compression ratio (4) and gain (75%). Decompression is working in this input size (36 KB). So, when the small-size input . dcm file, image reconstruction is required, this implementation work will be applicable. This application gives good PSNR (45.1486) and better MSE (29.88) and SNR

Input MRI .dcm file input having 522 KB size for evaluation of various image quality parameters.

Haar DWT; there is no output for Haar wavelet type.

Initial size of image: 522 KB (image name: 1.2.840.113619.2.5.1762583153.215519.

978957063.122.dcm)

Haar DWT

Coding Theory

Table 4.

Sr. no.

Table 5.

206

quality is degraded due to PSNR value = 32.7496.

Wavelet type Size of compressed

image (in KB)

(30.0124) for Daubechies DWT.

The obtained result shows that.

1 Biorthogonal DWT

6 Reverse

7 Discrete Meyer DWT

Biorthogonal DWT

Comparison of implemented method and .JPG format for 10 CT scan DICOM images.

#### 5. Conclusions

Compression and decompression are necessary tasks in medical imaging applications. This implementation provides patient data integration within the medical image. It is very important to maintain the patient data security. This implementation work is very helpful to hide and recover patient information within the medical image and follow compression/decompression without any data loss. In this implanted work, 2D DWT and N-level decomposition are applied on medical image, and then the extracted detail coefficients are firstly encoded by RLE. Secondly, the extracted approximate coefficient and encoded detailed coefficients are encoded by Huffman encoder. The generated result shown that Biorthogonal DWT gives better compression size, compression ratio, and compression gain for higher decomposition level (N = 4), but image quality parameters like PSNR and MSE are degraded. After comparison with JPEG file format and implemented work, this work gives less compression size. We conclude that for medical image compression, we can select N = 3, decomposition level with λ = 3, threshold value, and Biorthogonal DWT for good image quality.

#### Acknowledgements

Authors thank Dr. S.V. Dudal, HOD, Department of Applied Electronics, SGBA University, Amravati, India, for providing all kind of facilities and support.

Coding Theory

### Conflict of interest

The authors declare that there is no conflict of interests, financial, potential, or otherwise associated with this manuscript.

References

2008/

[1] NEMA Publications. DICOM Standard. Digital Imaging and Communications in Medicine

(DICOM). 2008. Available from: ftp:// medical.nema.org/medical/dicom/

DOI: http://dx.doi.org/10.5772/intechopen.89143

Emerging Technology and Advanced

[9] Ujgare NS, Baviskar SP. Conversion of DICOM image in to JPEG, BMP and PNG image format. International Journal of Computer Applications. 2013;

databases used in image processing and their applications. International Journal of Scientific and Engineering Research.

[11] Chen P. Study on medical image processing technologies based on DICOM. Journal of Computers. 2012;7(10):1-8. DOI: 10.4304/jcp.7.10.2354-2361

[13] Zhifeng L, Changhong F, Xu F, Zhicong Q, Shunxiang W. An easy image compression method and its realization base on MATLAB,

[14] Dimitrovski I, Guguljanov P, Loskovska S. Implementation of webbased medical image retrieval system in Oracle. In: IEEE 2nd Intl. Conference on Adaptive Science & Technology. 2009

Transformation of DICOM digital medical image format into BMP general

[16] Aliming H. The conversion between DICOM medical image format and common graphic format [master degree dissertation]. Chengdu, Sichuan: Chinqi

image format. Microcomputer Informatics. 2010;26:195-197

[15] Xiaolei SHI, Wang M.

Sichuan University; 2006

Information Engineering and Computer science International Conference; 2009

[12] Vinayak B, Gaikwad AN, Kanaskar M. DICOM medical data compression for telemedicine IN rural areas. Advances in Engineering Science.

[10] Kaur S, Jindal G. Survey of

Engineering. 2013;3:1-5

62(11):22-26

The DICOM Image Compression and Patient Data Integration using Run Length and Huffman…

2011;2(10):1-9

2008;2:001-006

[2] Cho K, Kim J, Jung SY, Kim K. Development of Medical Imaging Viewer Role in DICOM Standard. Hyun-Kook Kuhng Daegu University, School of Computer and Communication Engineering \*\*ETRI, Broadcasting Media Research Group 0-7803-

[3] Mousa WA, Shwehdi MH, Abdul-Malek MA. Conversion of DICOM System Images to Common Standard

Asia SENSE. SENSOR; 2003. pp. 251–255

[4] Suapang P, Dejhan K, Yimmun S. Medical Image Compression and DICOM-Format Image Archive. Bangkok, Thailand: Department of Telecommunication Engineering, King Mongkut's Institute of Technology Ladkrabang, ICCAS-SICE; 2009

[5] Larobina M, Murino L. Medical image file format. Journal of Digital Imaging. 2014;27(2):200-206. DOI:

[6] Verma DR. Managing DICOM image: Tips and tricks for radiologist. Indian Journal of Radiology and Imaging. 2012;

Scarsbrook AF. DICOM demystified: A review of digital file formats and their use in radiological practice. Clinical Radiology. 2005;60(11):1133-1140. DOI:

[8] Baraskar T, Pawar A. Conversion of DICOM image to common standard image formats. International Journal of

10.1007/s10278-013-9657-9

22(1):004-013. DOI: 10.4103/

[7] Graham RNJ, Perriss RW,

10.1016/j.crad.2005.07.003

0971-3026.95396

209

8940-91051 IEEE; 2005

Image Format Using Matlab.

#### Author details

Trupti N. Baraskar<sup>1</sup> \* and Vijay R. Mankar<sup>2</sup>

1 Department of Electronic Engineering, Sant Gadge Baba Amravati University, Amravati, India

2 Department of Electronic and Telecommunication Engineering, Government Polytechnic, Washim, India

\*Address all correspondence to: trupti.baraskar@mitwpu.edu.in

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The DICOM Image Compression and Patient Data Integration using Run Length and Huffman… DOI: http://dx.doi.org/10.5772/intechopen.89143

#### References

Conflict of interest

Coding Theory

Author details

Amravati, India

208

Trupti N. Baraskar<sup>1</sup>

Polytechnic, Washim, India

provided the original work is properly cited.

\* and Vijay R. Mankar<sup>2</sup>

\*Address all correspondence to: trupti.baraskar@mitwpu.edu.in

1 Department of Electronic Engineering, Sant Gadge Baba Amravati University,

2 Department of Electronic and Telecommunication Engineering, Government

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium,

otherwise associated with this manuscript.

The authors declare that there is no conflict of interests, financial, potential, or

[1] NEMA Publications. DICOM Standard. Digital Imaging and Communications in Medicine (DICOM). 2008. Available from: ftp:// medical.nema.org/medical/dicom/ 2008/

[2] Cho K, Kim J, Jung SY, Kim K. Development of Medical Imaging Viewer Role in DICOM Standard. Hyun-Kook Kuhng Daegu University, School of Computer and Communication Engineering \*\*ETRI, Broadcasting Media Research Group 0-7803- 8940-91051 IEEE; 2005

[3] Mousa WA, Shwehdi MH, Abdul-Malek MA. Conversion of DICOM System Images to Common Standard Image Format Using Matlab. Asia SENSE. SENSOR; 2003. pp. 251–255

[4] Suapang P, Dejhan K, Yimmun S. Medical Image Compression and DICOM-Format Image Archive. Bangkok, Thailand: Department of Telecommunication Engineering, King Mongkut's Institute of Technology Ladkrabang, ICCAS-SICE; 2009

[5] Larobina M, Murino L. Medical image file format. Journal of Digital Imaging. 2014;27(2):200-206. DOI: 10.1007/s10278-013-9657-9

[6] Verma DR. Managing DICOM image: Tips and tricks for radiologist. Indian Journal of Radiology and Imaging. 2012; 22(1):004-013. DOI: 10.4103/ 0971-3026.95396

[7] Graham RNJ, Perriss RW, Scarsbrook AF. DICOM demystified: A review of digital file formats and their use in radiological practice. Clinical Radiology. 2005;60(11):1133-1140. DOI: 10.1016/j.crad.2005.07.003

[8] Baraskar T, Pawar A. Conversion of DICOM image to common standard image formats. International Journal of Emerging Technology and Advanced Engineering. 2013;3:1-5

[9] Ujgare NS, Baviskar SP. Conversion of DICOM image in to JPEG, BMP and PNG image format. International Journal of Computer Applications. 2013; 62(11):22-26

[10] Kaur S, Jindal G. Survey of databases used in image processing and their applications. International Journal of Scientific and Engineering Research. 2011;2(10):1-9

[11] Chen P. Study on medical image processing technologies based on DICOM. Journal of Computers. 2012;7(10):1-8. DOI: 10.4304/jcp.7.10.2354-2361

[12] Vinayak B, Gaikwad AN, Kanaskar M. DICOM medical data compression for telemedicine IN rural areas. Advances in Engineering Science. 2008;2:001-006

[13] Zhifeng L, Changhong F, Xu F, Zhicong Q, Shunxiang W. An easy image compression method and its realization base on MATLAB, Information Engineering and Computer science International Conference; 2009

[14] Dimitrovski I, Guguljanov P, Loskovska S. Implementation of webbased medical image retrieval system in Oracle. In: IEEE 2nd Intl. Conference on Adaptive Science & Technology. 2009

[15] Xiaolei SHI, Wang M. Transformation of DICOM digital medical image format into BMP general image format. Microcomputer Informatics. 2010;26:195-197

[16] Aliming H. The conversion between DICOM medical image format and common graphic format [master degree dissertation]. Chengdu, Sichuan: Chinqi Sichuan University; 2006

#### Coding Theory

[17] Cyriac M, Chellamuthu C. Medical image compression using visual quantization and modified run length encoding. Biomedical Imaging and Intervention Journal. 2013;7(2):1-8. DOI: 10.2349/biij9.2.e5

[18] Hussain AJ, Al-Fayadh A, Radi N. Image compression technique: A survey in lossless and lossy algorithms. Neurocomputing. 2018;300:44-69

[19] Akhtar MB, Qureshi AM, Qamar ul Islam. Optimum Run Length Coding for JPEG Image Compression used in Space Research Program of IST. IEEE; 2011

[20] Janet J, Mohandass D, Meenalosini S. Lossless Compression Techniques for Medical Images in Telemedicine-Advances in Telemedicine Technologies, Enabling Factors, Scenarios. Austria: INTECH; 2011. pp. 111-130. ISBN 978- 953307-159-6

[21] Salomon D. Data Compression-the Complete Reference. 2nd ed. Heidelberg: Springer; 2001

[22] Shajun Nisha S, Kothar Mohideen S. Wavelet coefficients Thresholding techniques for Denoising MRI images. Indian Journal of Science and Technology. 2016;9(28):1-8. DOI: 10.17485/ijst/2016/v9i28/93872

[23] Ray A. Performance evaluation of various image compression techniques using SVD, DCT and DWT. International Journal of Research in Engineering and Technology. 2017;6(6): 1-6

[17] Cyriac M, Chellamuthu C. Medical image compression using visual quantization and modified run length encoding. Biomedical Imaging and Intervention Journal. 2013;7(2):1-8.

[18] Hussain AJ, Al-Fayadh A, Radi N. Image compression technique: A survey

[19] Akhtar MB, Qureshi AM, Qamar ul Islam. Optimum Run Length Coding for JPEG Image Compression used in Space Research Program of IST. IEEE; 2011

Meenalosini S. Lossless Compression Techniques for Medical Images in Telemedicine-Advances in

Telemedicine Technologies, Enabling Factors, Scenarios. Austria: INTECH;

[21] Salomon D. Data Compression-the

[22] Shajun Nisha S, Kothar Mohideen S. Wavelet coefficients Thresholding techniques for Denoising MRI images.

[23] Ray A. Performance evaluation of various image compression techniques

International Journal of Research in Engineering and Technology. 2017;6(6):

in lossless and lossy algorithms. Neurocomputing. 2018;300:44-69

[20] Janet J, Mohandass D,

2011. pp. 111-130. ISBN 978-

Complete Reference. 2nd ed. Heidelberg: Springer; 2001

Indian Journal of Science and Technology. 2016;9(28):1-8. DOI: 10.17485/ijst/2016/v9i28/93872

using SVD, DCT and DWT.

1-6

210

953307-159-6

DOI: 10.2349/biij9.2.e5

Coding Theory

### *Edited by Sudhakar Radhakrishnan and Muhammad Sarfraz*

This book is intended to attract the attention of practitioners and researchers in academia and industry interested in challenging paradigms of coding theory and computer vision. The chapters in this comprehensive reference explore the latest developments, methods, approaches, and applications of coding theory in a wide variety of fields and endeavours. This book is compiled with a view to provide researchers, academicians, and readers with an in-depth discussion of the latest advances in this field. It consists of twelve chapters from academicians, practitioners, and researchers from different disciplines of life. All the chapters are authored by various researchers around the world covering the field of coding theory and image and video processing. This book mainly focusses on researchers who can do quality research in the area of coding theory and image and video processing and related fields. Each chapter is an independent research study, which will motivate young researchers to think about. These twelve chapters are presented in three sections and will be an eye-opener for all systematic researchers in these fields.

Published in London, UK © 2020 IntechOpen © noLimit46 / iStock

Coding Theory

Coding Theory

*Edited by Sudhakar Radhakrishnan* 

*and Muhammad Sarfraz*