**Meet the editor**

Dr Mithun Das Gupta received his bachelors from the prestigious Indian Institute of Technology, Kharagpur in 2001. He got his masters and PhD from the ECE Department at the University of Illinois Urbana Champaign in 2008. He was a member of the Image Formation and Processing Lab at the Beckman Institute of Advanced Science and Technology. His areas of interest include

computer vision and image processing, graphical models for image classification, machine learning, approximate inference, convex optimization techniques and dimensionality reduction. He has published in a wide variety of prestigious conferences and journals. He is currently a Lead Scientist at GE Global Research Bangalore India.

Contents

Chapter 1 Recent Advances in Watermarking for Scalable Video Coding 1 Dan Grois and Ofer Hadar

Chapter 2 Perceptual Image Hashing 17

Tadahiko Kimoto

Tingyuan Nie

Chapter 8 2D Watermarking:

Hassen Seddik

Chapter 9 Audio Watermarking for Automatic

Chapter 6 Performance Evaluation for IP

and Abdellah Ait Ouahman

Chapter 3 Robust Multiple Image Watermarking Based on Spread Transform 43 Jaishree Jain and Vijendra Rai

Azhar Hadmi, William Puech, Brahim Ait Es Said

Chapter 4 Real Time Implementation of Digital Watermarking

Protection Watermarking Techniques 119

Chapter 7 Using Digital Watermarking for Copyright Protection 137

Identification of Radiotelephone Transmissions

Chapter 5 Sophisticated Spatial Domain Watermarking by Bit Inverting Transformation 91

Charlie Obimbo and Behzad Salami

Non Conventional Approaches 159

in VHF Maritime Communication 209 Oleksandr V. Shishkin and Vitaliy M. Koshevyy

Algorithm for Image and Video Application 65 Amit Joshi, Vivekanand Mishra and R. M. Patrikar

## Contents



## Preface

This collection of books brings some of the latest developments in the field of watermarking. Researchers from varied background and expertise propose a remarkable collection of chapters to render this work an important piece of scientific research. The chapters deal with a gamut of fields where watermarking can be used to encode copyright information. The work also presents a wide array of algorithms ranging from intelligent bit replacement to more traditional methods like ICA. The current work is split into two books. Book one is more traditional in its approach dealing mostly with image watermarking applications. Book two deals with audio watermarking and describes an array of chapters on performance analysis of algorithms.

> **Mithun Das Gupta**  Bio Signals and Analysis lab at GE Global Research Bangalore India

**1** 

*Israel* 

**Recent Advances in Watermarking** 

The H.264/AVC (ISO/IEC MPEG-4 Part 10) video coding standard (Wiegand & Sullivan, 2003), which was officially issued in 2003, has become a challenge for real-time video applications. Compared to the MPEG-2 standard, it gains about 50% in bit rate, while providing the same visual quality. In addition to having all the advantages of MPEG-2 (ITU-T & ISO/IEC JTC 1, 1994), H.263 (ITU-T, 2000), and MPEG-4 (ISO/IEC JTC 1, 2004), the H.264 video coding standard possesses a number of improvements, such as the contentadaptive-based arithmetic codec (CABAC), enhanced transform and quantization, prediction of "Intra" macroblocks, and others. H.264 is designed for both constant bit rate (CBR) and variable bit rate (VBR) video coding, useful for transmitting video sequences over statistically multiplexed networks, the Ethernet, or other Internet networks). This video coding standard can also be used at any bit rate range for various applications, varying from wireless video phones to high definition television (HDTV) and digital video broadcasting (DVB). In addition, H.264 provides significantly improved coding efficiency and greater functionality, such as rate scalability, "Intra" prediction and error resilience in comparison with its predecessors, MPEG-2 and H.263. However, H.264/AVC is much more complex in comparison to other coding standards and to achieve maximum quality encoding, high

computational resources are required (Grois et al., 2010a; Kaminsky et al., 2008).

many multimedia applications (Grois et al., 2010b; Grois et al., 2010c).

Due to the recent technological achievements and trends, the high-definition, highly interactive networked media applications pose challenges to network operators. The variety of end-user devices with different capabilities, ranging from cell phones with small screens and restricted processing power to high-end PCs with high-definition displays, have stimulated significant interest in effective technologies for video adaptation for spatial formats, consuming power and bit rate. As a result, much of the attention in the field of video adaptation is currently directed to the Scalable Video Coding (abbreviated as "SVC" or "H.264/SVC"), which was standardized in 2007 as an extension of H.264/AVC (Schwarz et al., 2007), since the bit-stream scalability for video is currently a very desirable feature for

Scalable video coding has been an active research and standardization area for at least 20 years (Schwarz et al., 2007). The prior international video coding standards MPEG-2 (ITU-T & ISO/IEC JTC 1, 1994), H.263 (ITU-T, 2000), and MPEG-4 (ISO/IEC JTC 1, 2004) already

**1. Introduction** 

**for Scalable Video Coding** 

*Ben-Gurion University of the Negev, Beer-Sheva* 

Dan Grois and Ofer Hadar

## **Recent Advances in Watermarking for Scalable Video Coding**

Dan Grois and Ofer Hadar

*Ben-Gurion University of the Negev, Beer-Sheva Israel* 

#### **1. Introduction**

The H.264/AVC (ISO/IEC MPEG-4 Part 10) video coding standard (Wiegand & Sullivan, 2003), which was officially issued in 2003, has become a challenge for real-time video applications. Compared to the MPEG-2 standard, it gains about 50% in bit rate, while providing the same visual quality. In addition to having all the advantages of MPEG-2 (ITU-T & ISO/IEC JTC 1, 1994), H.263 (ITU-T, 2000), and MPEG-4 (ISO/IEC JTC 1, 2004), the H.264 video coding standard possesses a number of improvements, such as the contentadaptive-based arithmetic codec (CABAC), enhanced transform and quantization, prediction of "Intra" macroblocks, and others. H.264 is designed for both constant bit rate (CBR) and variable bit rate (VBR) video coding, useful for transmitting video sequences over statistically multiplexed networks, the Ethernet, or other Internet networks). This video coding standard can also be used at any bit rate range for various applications, varying from wireless video phones to high definition television (HDTV) and digital video broadcasting (DVB). In addition, H.264 provides significantly improved coding efficiency and greater functionality, such as rate scalability, "Intra" prediction and error resilience in comparison with its predecessors, MPEG-2 and H.263. However, H.264/AVC is much more complex in comparison to other coding standards and to achieve maximum quality encoding, high computational resources are required (Grois et al., 2010a; Kaminsky et al., 2008).

Due to the recent technological achievements and trends, the high-definition, highly interactive networked media applications pose challenges to network operators. The variety of end-user devices with different capabilities, ranging from cell phones with small screens and restricted processing power to high-end PCs with high-definition displays, have stimulated significant interest in effective technologies for video adaptation for spatial formats, consuming power and bit rate. As a result, much of the attention in the field of video adaptation is currently directed to the Scalable Video Coding (abbreviated as "SVC" or "H.264/SVC"), which was standardized in 2007 as an extension of H.264/AVC (Schwarz et al., 2007), since the bit-stream scalability for video is currently a very desirable feature for many multimedia applications (Grois et al., 2010b; Grois et al., 2010c).

Scalable video coding has been an active research and standardization area for at least 20 years (Schwarz et al., 2007). The prior international video coding standards MPEG-2 (ITU-T & ISO/IEC JTC 1, 1994), H.263 (ITU-T, 2000), and MPEG-4 (ISO/IEC JTC 1, 2004) already

Recent Advances in Watermarking for Scalable Video Coding 3

The term "scalability" refers to the removal of parts of the video bit stream in order to adapt it to the various needs or preferences of end users as well as to varying terminal capabilities or network conditions. According to (Schwarz et al., 2007), the objective of the SVC standardization has been to enable the encoding of a high-quality video bit stream that contains one or more subset bit streams that can themselves be decoded with a complexity and reconstruction quality similar to that achieved using the existing H.264/AVC design with the same quantity of data as in the subset bit stream. *Figure 2* below presents a blockdiagram of a SVC encoder, which has for simplicity two spatial layers: *Layer 0*, which is the Base Layer, and *Layer 1*, which is the first Enhancement Layer. It should be noted that in order to improve the coding efficiency of the Scalable Video Coding in comparison to simulcasting of different spatial resolutions, additional "inter-layer prediction mechanisms"

Fig. 2. Block-diagram of the spatial SVC encoding scheme (for simplicity, only two layers are presented: *Layer 0*, which is the Base Layer, and *Layer 1*, which is the first Enhancement

The Scalable Video Coding has achieved significant improvements in coding efficiency comparing to the scalable profiles of prior video coding standards. As a result, the Scalable Video Coding is currently a highly attractive solution to the problems posed by the

characteristics of modern video transmission systems (Schwarz et al., 2007).

are incorporated (Schwarz et al., 2007).

Layer).

include several tools by which the most important scalability modes can be supported. However, the scalable profiles of those standards have rarely been used. Reasons for that include the characteristics of traditional video transmission systems as well as the fact that the spatial and quality scalability features came along with a significant loss in coding efficiency as well as a large increase in decoder complexity as compared to the corresponding non-scalable profiles (Schwarz et al., 2007; Wiegand & Sullivan, 2003).

To fulfill these requirements, it would be beneficial to simultaneously transmit or store video in variety of spatial/temporal resolutions and qualities, leading to the video bitstream scalability. Major requirements for the Scalable Video Coding are to enable encoding of a high-quality video bitstream that contains one or more subset bitstreams, each of which can be transmitted and decoded to provide video services with lower temporal or spatial resolutions, or to provide reduced reliability, while retaining reconstruction quality that is highly relative to the rate of the subset bitstreams. Therefore, the Scalable Video Coding provides important functionalities, such as the spatial, temporal and SNR (quality) scalability, thereby enabling the power adaptation. In turn, these functionalities lead to enhancements of video transmission and storage applications (Grois et al., 2010b; Grois et al., 2010c; Grois & Hadar, 2011).

Scalable Video Coding bitsream contains a Base-Layer (*Layer 0*) and one or more Enhancement Layers (*Layers 1, 2, etc.*), while the Base-Layer provides the lowest bitsream resolution with regard to the spatial, temporal and SNR/Quality scalability, as schematically presented in *Figure 1* (Schierl et al., 2007).

Fig. 1. Schematic representation of the SVC bitsream: the resolution is increased with the increase of the layer index, while the Base-Layer (*Layer 0*) has the lowest bitsream resolution (Schierl et al., 2007).

include several tools by which the most important scalability modes can be supported. However, the scalable profiles of those standards have rarely been used. Reasons for that include the characteristics of traditional video transmission systems as well as the fact that the spatial and quality scalability features came along with a significant loss in coding efficiency as well as a large increase in decoder complexity as compared to the

To fulfill these requirements, it would be beneficial to simultaneously transmit or store video in variety of spatial/temporal resolutions and qualities, leading to the video bitstream scalability. Major requirements for the Scalable Video Coding are to enable encoding of a high-quality video bitstream that contains one or more subset bitstreams, each of which can be transmitted and decoded to provide video services with lower temporal or spatial resolutions, or to provide reduced reliability, while retaining reconstruction quality that is highly relative to the rate of the subset bitstreams. Therefore, the Scalable Video Coding provides important functionalities, such as the spatial, temporal and SNR (quality) scalability, thereby enabling the power adaptation. In turn, these functionalities lead to enhancements of video transmission and storage applications (Grois et al., 2010b; Grois et

Scalable Video Coding bitsream contains a Base-Layer (*Layer 0*) and one or more Enhancement Layers (*Layers 1, 2, etc.*), while the Base-Layer provides the lowest bitsream resolution with regard to the spatial, temporal and SNR/Quality scalability, as

Fig. 1. Schematic representation of the SVC bitsream: the resolution is increased with the increase of the layer index, while the Base-Layer (*Layer 0*) has the lowest bitsream resolution

corresponding non-scalable profiles (Schwarz et al., 2007; Wiegand & Sullivan, 2003).

al., 2010c; Grois & Hadar, 2011).

(Schierl et al., 2007).

schematically presented in *Figure 1* (Schierl et al., 2007).

The term "scalability" refers to the removal of parts of the video bit stream in order to adapt it to the various needs or preferences of end users as well as to varying terminal capabilities or network conditions. According to (Schwarz et al., 2007), the objective of the SVC standardization has been to enable the encoding of a high-quality video bit stream that contains one or more subset bit streams that can themselves be decoded with a complexity and reconstruction quality similar to that achieved using the existing H.264/AVC design with the same quantity of data as in the subset bit stream. *Figure 2* below presents a blockdiagram of a SVC encoder, which has for simplicity two spatial layers: *Layer 0*, which is the Base Layer, and *Layer 1*, which is the first Enhancement Layer. It should be noted that in order to improve the coding efficiency of the Scalable Video Coding in comparison to simulcasting of different spatial resolutions, additional "inter-layer prediction mechanisms" are incorporated (Schwarz et al., 2007).

Fig. 2. Block-diagram of the spatial SVC encoding scheme (for simplicity, only two layers are presented: *Layer 0*, which is the Base Layer, and *Layer 1*, which is the first Enhancement Layer).

The Scalable Video Coding has achieved significant improvements in coding efficiency comparing to the scalable profiles of prior video coding standards. As a result, the Scalable Video Coding is currently a highly attractive solution to the problems posed by the characteristics of modern video transmission systems (Schwarz et al., 2007).

Recent Advances in Watermarking for Scalable Video Coding 5

c. *Semi-Fragile:* semi-fragile wateermarks are designed to be fragile with respect to some changes but to tolerate other changes. For example, they may be robust to compression but will be able to detect malicious tampering. This can be achived by carefully designing the watermark to be robust for particular image/video manipulations.

Fig. 3. Three different watermarking embedding schemes by using spatial Scalable Video Coding (Meerwald, P. & Uhl, A., 2010a): a) Watermark embedding prior to the video encoding; b) Integrated watermark embedding and coding; and c) Compressed-domain

Further, *Table 1* below presents common watermarling applications, which are used with

(a)

(b)

(c)

embedding after encoding.

regard to different watermark features (Bhowmik, 2010):

Scalable Video Coding poses new challenges for watermarking that need to be addressed to achieve full protection of the scalable content (Meerwald, 2011; Lin et al., 2004), while maintaining low bit-rate overhead due to watermarking. Challenges that complicate watermark detection include the very different statistics of the transform domain coefficients of scalable base- and enhancement layers, the combination of multi-channel detection results for incremental detection performance (Piper et al., 2005), as well as the prediction of data between scalability layers which complicates the modeling of the embedding domain. Despite intense research in the area of image and video watermarking (Meerwald, 2011; Lin et al., 2004), the peculiarities of watermarked scalable multimedia content have received limited attention and a number of challenges remain.

One of the main challenges for watermarking the rate-scalable compressed video is that not all receivers will have access to the entire (watermarked) video stream (Lin et al., 2001). The embedded watermark must be detectable when only the base layer is decoded (for layered and hybrid layered/embedded methods) or for a low rate version of the video stream (for embedded methods.) However, the enhancement information adds value to the video stream and should not be left unprotected by a watermark. Ideally, there should be a uniform improvement in the detectability of an embedded watermark as the decoded rate increases.

According to one method for watermarking the rate-scalable video streams, a watermark is embedded in the base layer and a separate watermark is embedded in the enhancement layer(s) (Lin et al., 2001). For temporal scalability, this is an effective method for watermarking as the enhancement information does not alter the frames encoded in the base layer. However, for other forms of scalability, care must be taken so that the multiple watermarks do not interfere with each other once the decoder merges the base and enhancement information. The watermarks could interfere in visibility, where the distortions introduced by adding all watermarks is unacceptable, or detectability, where the presence of all the watermarks impair the ability to detect each watermark individually. The ability to detect each embedded watermark individually (before the enhancement and base information are merged) is not sufficient for a robust watermark, as such a system would be vulnerable to a collusion attack between the non-enhanced and enhanced versions of the video.

For embedded scalability modes, one could design a watermark analogous to an embedded coding scheme, where the most significant structures of the watermark are placed near the beginning of the video stream, followed by structures of lesser significance (Lin et al., 2001).

With this regard, *Figure 3* below presents different watermarking embedding schemes by using the SVC spatial scalability (Meerwald, P. & Uhl, A., 2010a).

Watermarking systems are oftern characterized by a set of common features and the importence of each feature depends on the application requrements. As known, the watermarks are generally devided to three main groups (Piper, 2010):


Scalable Video Coding poses new challenges for watermarking that need to be addressed to achieve full protection of the scalable content (Meerwald, 2011; Lin et al., 2004), while maintaining low bit-rate overhead due to watermarking. Challenges that complicate watermark detection include the very different statistics of the transform domain coefficients of scalable base- and enhancement layers, the combination of multi-channel detection results for incremental detection performance (Piper et al., 2005), as well as the prediction of data between scalability layers which complicates the modeling of the embedding domain. Despite intense research in the area of image and video watermarking (Meerwald, 2011; Lin et al., 2004), the peculiarities of watermarked scalable multimedia

One of the main challenges for watermarking the rate-scalable compressed video is that not all receivers will have access to the entire (watermarked) video stream (Lin et al., 2001). The embedded watermark must be detectable when only the base layer is decoded (for layered and hybrid layered/embedded methods) or for a low rate version of the video stream (for embedded methods.) However, the enhancement information adds value to the video stream and should not be left unprotected by a watermark. Ideally, there should be a uniform improvement in the detectability of an embedded watermark as the decoded rate increases.

According to one method for watermarking the rate-scalable video streams, a watermark is embedded in the base layer and a separate watermark is embedded in the enhancement layer(s) (Lin et al., 2001). For temporal scalability, this is an effective method for watermarking as the enhancement information does not alter the frames encoded in the base layer. However, for other forms of scalability, care must be taken so that the multiple watermarks do not interfere with each other once the decoder merges the base and enhancement information. The watermarks could interfere in visibility, where the distortions introduced by adding all watermarks is unacceptable, or detectability, where the presence of all the watermarks impair the ability to detect each watermark individually. The ability to detect each embedded watermark individually (before the enhancement and base information are merged) is not sufficient for a robust watermark, as such a system would be vulnerable to a collusion attack

For embedded scalability modes, one could design a watermark analogous to an embedded coding scheme, where the most significant structures of the watermark are placed near the beginning of the video stream, followed by structures of lesser significance (Lin et al., 2001). With this regard, *Figure 3* below presents different watermarking embedding schemes by

Watermarking systems are oftern characterized by a set of common features and the importence of each feature depends on the application requrements. As known, the

a. *Robust:* Robust watermarks are designed to be resistant to manipulations of the content. Therefore, a robust watermark can be still detected after the content has undergone

b. *Fragile:* fragile watermarks are very sensitive to any manipulations to the content. This does not make the fragile watermark inferior to the robust watermark, since different

processing, such as resampling, cropping, lossy compression, and the like.

applications demand different amounts of robustness or fragility.

content have received limited attention and a number of challenges remain.

between the non-enhanced and enhanced versions of the video.

using the SVC spatial scalability (Meerwald, P. & Uhl, A., 2010a).

watermarks are generally devided to three main groups (Piper, 2010):

c. *Semi-Fragile:* semi-fragile wateermarks are designed to be fragile with respect to some changes but to tolerate other changes. For example, they may be robust to compression but will be able to detect malicious tampering. This can be achived by carefully designing the watermark to be robust for particular image/video manipulations.

Fig. 3. Three different watermarking embedding schemes by using spatial Scalable Video Coding (Meerwald, P. & Uhl, A., 2010a): a) Watermark embedding prior to the video encoding; b) Integrated watermark embedding and coding; and c) Compressed-domain embedding after encoding.

Further, *Table 1* below presents common watermarling applications, which are used with regard to different watermark features (Bhowmik, 2010):

Recent Advances in Watermarking for Scalable Video Coding 7

plurality of heterogeneous mobile devices, the multimedia resources must be accessed by many different terminals, which require the source single multimedia stream to meet the varying terminal capabilities. Thus, the Scalable Video Coding can be efficiently employed to achieve these goals. However, due to the SVC scalability, the source video stream can be decoded into a plurality of streams, each having a different resolution, frame rate and video presentation quality, according to each end-user terminal. Therefore, there are many challenges for watermarking by using the Scalable Video Coding approach (Shi et al., 2010). It should be noted that using the prior knowledge of the Scalable Video Coding system and the transmission channel are beneficial for the watermarking system (Meerwald & Uhl, 2008), thereby enabling to use a number of supported spatial and temporal layers, denosing and deblocking filters, and the like (as schematically shown in *Figure 4*). As it is known, by exploiting the host video as the side-information at the encoder, in message coding and watermark embedding, the negative impact of host signal noise on the watermark decoder

Fig. 4. Schematic diagram of the watermark communication channel by using Scalable Video

With regard to this issue, (Meerwald & Uhl, 2008) present a frame-by-frame scalable watermarking scheme that is robust for spatial, temporal and quality scalabilities, in which the luminance component of each frame is decomposed using a two-level wavelet transform with a 7/9 bi-orthogonal filter. Separate watermarks are embedded in the approximation and each detail subband layer. According to (Meerwald & Uhl, 2008), an additive spread-

> ,, , *l o*( , ) ( , ) ( , ) ( , ), + + *l o*

 is a global strength factor and , (, ) *l o s nm* is a perceptual shaping mask derived from a combined local noise and frequency sencitivity model. *l* and *o* indicate a hierarchical level and orientation of the subband. Blind watermark detection can be performed independently

*lo l d nm d nm s nm w nm* (1)

spectrum watermark (, ) *w nm <sup>l</sup>* is added to the detail subband coefficients , (, ) *l o d nm* ,

performance can be cancelled (Cox et al., 2002).

Coding for blind watermarking (Meerwald & Uhl, 2008).

'

where 


Table 1. Common watermarling applications (Bhowmik, 2010).

Since, the robust watermarking algorithms, which are designed specifically for robustness, are preferred in a majority of watermarking applications, we mainly fosus this chapter on this type of watermarking. Also, we make a special emphazis on the combined schemes of watermarking and encryption by using the H.264/SVC due to the increasing interest with regard to this issue.

This chapter is organized as follows: in *Section 2*, we present recent advances in robust watermarking by using the Scalable Video Coding, in *Section 3,* we discuss recent advances in the scalable fragile watermarking, then in *Section 4*, we present recent compresseddomain watermarking techniques by using the Scalable Video Coding, and after that in *Section 5*, we talk about combined schemes of watermarking and encryption by using the Scalable Video Coding. The future research directions are outlined in *Section 6*, and this chapter is concluded in *Section 7*.

#### **2. Robust watermarking by using scalable video coding**

In general, digital watermarking has been proposed as a solution to the problem of copyright protection of multimedia data in the complicated network environment (Shi et al., 2010). Especially, in today's society, with the progress of 3G/4G wireless networks and the

*Broadband Monitoring* Passive monitoring by the automatic watermark detection of the broadcasted watermarked media.

protection against digital forgery.

*Copy Control* Disabling copy of CD/DVD by the watermarked permission.

watermark information as the copyright data.

Transaction tracking and protection against forged

the like) by embedding a watermark on packaging.

record, including a blood sample, X-ray, etc.

Authentication of digitally preserved patient's medical

Digital authentication of an insurance claim, banking, financial, mortgage and corporate documents.

Piracy control by video authentication at video hosting

consumable items (including pharmaceutical products, and

*Copyright Identification* Resolving copyright issues of digital media by using the

*Content Authentication* Authentication of original art work, performance and

*Access Control* Access control applications, such as, Pay-TV.

*Media Piracy Control* Tracking of the source of the media piracy.

media owner.

Table 1. Common watermarling applications (Bhowmik, 2010).

**2. Robust watermarking by using scalable video coding** 

*Ownership Identification* Supporting a legitimate claim, such as, royalty by the the

*Transaction Tracking* Tracking of the media ownership in a buyer-seller scenario. *Meta-data Hiding* Hiding meta-data within the media instead of a big header. *Video Summary Creation* Instant retrieval of video summary by embedding the summary within the host video.

servers, including Youtube™, etc.

Since, the robust watermarking algorithms, which are designed specifically for robustness, are preferred in a majority of watermarking applications, we mainly fosus this chapter on this type of watermarking. Also, we make a special emphazis on the combined schemes of watermarking and encryption by using the H.264/SVC due to the increasing interest with

This chapter is organized as follows: in *Section 2*, we present recent advances in robust watermarking by using the Scalable Video Coding, in *Section 3,* we discuss recent advances in the scalable fragile watermarking, then in *Section 4*, we present recent compresseddomain watermarking techniques by using the Scalable Video Coding, and after that in *Section 5*, we talk about combined schemes of watermarking and encryption by using the Scalable Video Coding. The future research directions are outlined in *Section 6*, and this

In general, digital watermarking has been proposed as a solution to the problem of copyright protection of multimedia data in the complicated network environment (Shi et al., 2010). Especially, in today's society, with the progress of 3G/4G wireless networks and the

*Packaging and Tracking* 

*Medical Record Authentication* 

*Insurance / BankingDocument Authentication* 

*Video Hosting Authentication* 

regard to this issue.

chapter is concluded in *Section 7*.

**Application Name Description** 

plurality of heterogeneous mobile devices, the multimedia resources must be accessed by many different terminals, which require the source single multimedia stream to meet the varying terminal capabilities. Thus, the Scalable Video Coding can be efficiently employed to achieve these goals. However, due to the SVC scalability, the source video stream can be decoded into a plurality of streams, each having a different resolution, frame rate and video presentation quality, according to each end-user terminal. Therefore, there are many challenges for watermarking by using the Scalable Video Coding approach (Shi et al., 2010).

It should be noted that using the prior knowledge of the Scalable Video Coding system and the transmission channel are beneficial for the watermarking system (Meerwald & Uhl, 2008), thereby enabling to use a number of supported spatial and temporal layers, denosing and deblocking filters, and the like (as schematically shown in *Figure 4*). As it is known, by exploiting the host video as the side-information at the encoder, in message coding and watermark embedding, the negative impact of host signal noise on the watermark decoder performance can be cancelled (Cox et al., 2002).

Fig. 4. Schematic diagram of the watermark communication channel by using Scalable Video Coding for blind watermarking (Meerwald & Uhl, 2008).

With regard to this issue, (Meerwald & Uhl, 2008) present a frame-by-frame scalable watermarking scheme that is robust for spatial, temporal and quality scalabilities, in which the luminance component of each frame is decomposed using a two-level wavelet transform with a 7/9 bi-orthogonal filter. Separate watermarks are embedded in the approximation and each detail subband layer. According to (Meerwald & Uhl, 2008), an additive spreadspectrum watermark (, ) *w nm <sup>l</sup>* is added to the detail subband coefficients , (, ) *l o d nm* ,

$$d\_{l,o}(n,m) = d\_{l,o}(n,m) + \alpha \cdot s\_{l,o}(n,m) \cdot w\_l(n,m),\tag{1}$$

where is a global strength factor and , (, ) *l o s nm* is a perceptual shaping mask derived from a combined local noise and frequency sencitivity model. *l* and *o* indicate a hierarchical level and orientation of the subband. Blind watermark detection can be performed independently

Recent Advances in Watermarking for Scalable Video Coding 9

(CPS). The authors first investigate where to embed the watermark to ensure it can be detected in the SVC Base Layer as well as in the Enhancement Layers, and then the authors propose a model that combines the frequency masking, contrast masking, luminance adaption and temporal masking. Finally, whether watermark exists or not is judged by the adaptive detection, which guarantees the proposed method has a good legal credibility,

The good authentication watermarking can detect and localize any change to the video, including changes in frame rate, video size or related video object (Wang et al., 2006). If the watermarked video is attacked by frame removing, and then the watermark extracting procedure is applied on the attacked video, the procedure returns a false alarm to indicate that the video content becomes incomplete. Also, if one change the size of watermarked video and then one applies the watermark extraction procedure on this resized video, the procedure returns an output that resembles random noise, meaning a false alarm. Similarly, if one modifies certain related video object, then the procedure will output a false alarm

With this regard, (Wang et al., 2006) propose to embed the watermark information into the Enhancement Layer of MPEG-4 Fine Granularity Scalability (FGS), as schematically shown in *Figure 6*, to detect the integrality of video stream. According to (Wang et al., 2006), it is supposed that *wi* denotes the *i*-th watermark bit, and *Tj* denotes the total number of "1" bits in the *j*-th 8x8 bit-plane. The watermark *wi* should be embedded into the *k*-th specified bit *Bk* in *j-*th bit-plane, and the detail of embedding watermarking can be described as follows. First, the specified bit (*k*-th bit) in the *j*-th bit-plane is selected by a runlength-selection algorithm for embedding *i*-th watermark bit. The run-length-selection algorithm can determine a specified bit for embedding watermark in 8x8 residue bit-plane and obtaining an optimal coding efficiency in run-length coding. If *wi* is "1", then *Tj* will be enforced to be as an odd value. Similarly, if *wi* is "0", then *Tj* will be enforced to be as an even value. That is, the specified bit *Bk* can be modified as ' *Bk* by the following

> 0 , ()0 1 , ()1

Since fragile watermarking has extremely low resistance for various attacks, the extracted watermark signal fairy easy lose its completeness when multimedia content is modified or changed by a pirate or hacker. Thus, the multimedia can be determined where it has been changed or modified illegally according to the completeness of extracted watermark. (Wang et al., 2006) propose a BCW (Bitplane-Coding Watermarking) algorithm to add watermark information to the residual bit-planes of the Enhancement Layer. In embedding procedure,

*if w E T <sup>B</sup>*

*i j*

(2)

*i j*

*if w E T* <sup>3</sup> ) <sup>6</sup> <sup>4</sup> ) <sup>65</sup>

In the *Section 3*, we discuss recent advances in scalable fragile watermarking.

**3. Recent advances in scalable fragile watermarking** 

'

*k*

where ( ) ( 1) mod2 *ET T j j* , and ") " denotes the exclusive "OR" operation.

since its False Alarm Rate (FAR) is close to zero.

(Wang et al., 2006).

expression:

for each hierarchical layer by using the normalized correlation coefficient detection. By applying a high-pass 3X3 Gaussian filter to the detail subbands prior to the correlation, some of the host interference is suppresses, which improves the detection statistics. Also, a different key is used for each frame to generate the watermark pattern (Meerwald & Uhl, 2008).

Further, (Meerwald, P. & Uhl, A., 2010b) focus on the watermark embedding in the intracoded macroblocks of an H.264-coded base layer. Each macroblock of the input frame is coded by using either intra- or inter-frame prediction, and the difference between input pixels and the prediction signal is the residual. The watermarked SVC base layer representation is used for predicting the SVC enhancement layer, as seen from *Figure 5* below (Meerwald, P. & Uhl, A., 2010b).

Fig. 5. Sample encoding watermarking structure of two spatial SVC layers (Meerwald, P. & Uhl, A., 2010b).

As already mentioned, for the scalable watermark system, the key scalable property is that the detection process is scalable (Shi et al., 2010). In other words, the system should be able to detect a watermark in all different scalable bits-streams. As the quality of multimedia decreases, the correlation between the watermark and watermarked signal may be decrease as well. So, it will not work effectively if the same threshold is used for each SVC layer. However, if different detective thresholds are used for different layers, the watermark system is required to transmit some extra side information. One potential measure is that the detective threshold can be adjusted adaptively according to the multimedia content.

With this regard, (Shi et al., 2010) propose a scalable and credible watermarking algorithm towards Scalable Video Coding (SVC), which aims to build Copyright Protection System

for each hierarchical layer by using the normalized correlation coefficient detection. By applying a high-pass 3X3 Gaussian filter to the detail subbands prior to the correlation, some of the host interference is suppresses, which improves the detection statistics. Also, a different key is used for each frame to generate the watermark pattern (Meerwald & Uhl,

Further, (Meerwald, P. & Uhl, A., 2010b) focus on the watermark embedding in the intracoded macroblocks of an H.264-coded base layer. Each macroblock of the input frame is coded by using either intra- or inter-frame prediction, and the difference between input pixels and the prediction signal is the residual. The watermarked SVC base layer representation is used for predicting the SVC enhancement layer, as seen from *Figure 5*

Fig. 5. Sample encoding watermarking structure of two spatial SVC layers (Meerwald, P. &

As already mentioned, for the scalable watermark system, the key scalable property is that the detection process is scalable (Shi et al., 2010). In other words, the system should be able to detect a watermark in all different scalable bits-streams. As the quality of multimedia decreases, the correlation between the watermark and watermarked signal may be decrease as well. So, it will not work effectively if the same threshold is used for each SVC layer. However, if different detective thresholds are used for different layers, the watermark system is required to transmit some extra side information. One potential measure is that the detective threshold can be adjusted adaptively according to the multimedia content.

With this regard, (Shi et al., 2010) propose a scalable and credible watermarking algorithm towards Scalable Video Coding (SVC), which aims to build Copyright Protection System

2008).

Uhl, A., 2010b).

below (Meerwald, P. & Uhl, A., 2010b).

(CPS). The authors first investigate where to embed the watermark to ensure it can be detected in the SVC Base Layer as well as in the Enhancement Layers, and then the authors propose a model that combines the frequency masking, contrast masking, luminance adaption and temporal masking. Finally, whether watermark exists or not is judged by the adaptive detection, which guarantees the proposed method has a good legal credibility, since its False Alarm Rate (FAR) is close to zero.

In the *Section 3*, we discuss recent advances in scalable fragile watermarking.

#### **3. Recent advances in scalable fragile watermarking**

The good authentication watermarking can detect and localize any change to the video, including changes in frame rate, video size or related video object (Wang et al., 2006). If the watermarked video is attacked by frame removing, and then the watermark extracting procedure is applied on the attacked video, the procedure returns a false alarm to indicate that the video content becomes incomplete. Also, if one change the size of watermarked video and then one applies the watermark extraction procedure on this resized video, the procedure returns an output that resembles random noise, meaning a false alarm. Similarly, if one modifies certain related video object, then the procedure will output a false alarm (Wang et al., 2006).

With this regard, (Wang et al., 2006) propose to embed the watermark information into the Enhancement Layer of MPEG-4 Fine Granularity Scalability (FGS), as schematically shown in *Figure 6*, to detect the integrality of video stream. According to (Wang et al., 2006), it is supposed that *wi* denotes the *i*-th watermark bit, and *Tj* denotes the total number of "1" bits in the *j*-th 8x8 bit-plane. The watermark *wi* should be embedded into the *k*-th specified bit *Bk* in *j-*th bit-plane, and the detail of embedding watermarking can be described as follows. First, the specified bit (*k*-th bit) in the *j*-th bit-plane is selected by a runlength-selection algorithm for embedding *i*-th watermark bit. The run-length-selection algorithm can determine a specified bit for embedding watermark in 8x8 residue bit-plane and obtaining an optimal coding efficiency in run-length coding. If *wi* is "1", then *Tj* will be enforced to be as an odd value. Similarly, if *wi* is "0", then *Tj* will be enforced to be as an even value. That is, the specified bit *Bk* can be modified as ' *Bk* by the following expression:

$$B\_k^{'} = \begin{cases} 0 & \text{if } w\_i \oplus E(T\_j) = 0 \\ 1 & \text{if } w\_i \oplus E(T\_j) = 1 \end{cases} \tag{2}$$

where ( ) ( 1) mod2 *ET T j j* , and ") " denotes the exclusive "OR" operation.

Since fragile watermarking has extremely low resistance for various attacks, the extracted watermark signal fairy easy lose its completeness when multimedia content is modified or changed by a pirate or hacker. Thus, the multimedia can be determined where it has been changed or modified illegally according to the completeness of extracted watermark. (Wang et al., 2006) propose a BCW (Bitplane-Coding Watermarking) algorithm to add watermark information to the residual bit-planes of the Enhancement Layer. In embedding procedure,

Recent Advances in Watermarking for Scalable Video Coding 11

where ( 0,1,2,3,4,...) *w i <sup>i</sup>* is the *i*-th data of watermark. Also, in the watermark extracting of (Wang et al., 2006), the received Enhancement Layer (EL) stream with the watermarking data can be decoded to bit-planes through the Variable-Length Decoding (VLD) at the

> **Fine-Granular Scalability (FGS) Bit-Plain Reconstruction**

**Enhancement Layer Stream**

**Watermark Extraction**

**Random Permutation**

**Key**

**Watermark Reconstruction**

Scalability (FGS) video stream (Wang et al., 2006).

Video Coding techniques.

Fig. 7. Extracting a watermark from an Enhancement Layer of the MPEG-4 Fine Granularity

In the following *Section 4*, we discuss compressed-domain watermarking by using Scalable

The concept of scalable watermarking is composed of the expansion of progressive coding and the watermark system (Seo & Park, 2005). Progressive watermarking techniques enables to transmit images with a built-in watermark progressively, and then to extract the watermark from the decoded images. The scalable digital watermarking is mostly related to the scalable video coding techniques. Therefore, the scalable digital watermarking enables to protect contents regardless of the transmission of a specific domain, and enables to extract watermark from any domain of the scalable contents. Also, the increase of the scalable domain can also reduce an error of the watermark extraction (Piper et al., 2004). In *Figure 8*, the compression is performed on the original image after the wavelet transform, and the selected coefficients and watermark key are combined, followed by the spectrum quantization and encoding (Seo & Park, 2005). Therefore, by progressively transmitting the

**4. Compressed-domain watermarking by using scalable video coding** 

receiver end.

the watermark information is embedded into every 8×8 block of residual bit-planes in the Enhancement Layer, while encoding to MPEG-4 FGS video stream. The watermark bit is modulated by modifying a specified bit that is selected from each 8×8 bit-plane such that the even/odd value of the total number of "1" bits can meet the corresponding watermark information. The main reasons for hiding watermark into enhancement layers is that minimal degradation of the host data can be imperceptible as the watermark signal is inserted into the enhancement layer.

Fig. 6. Embedding a watermark in an Enhancement Layer of the MPEG-4 FGS video stream (Wang et al., 2006).

In turn, in *Figure 7* is presented a block diagram for the watermark extraction from the Enhancement Layer of MPEG-4 FGS video stream (Wang et al., 2006). If ( ) *E Tj* is "1", the extracted watermarking data is equal to "1". Otherwise, if ( ) *ET j* is "0", the extracted watermarking data is also "0". The equation for extracting watermark can be expressed as follows:

$$w\_i^\cdot = \begin{cases} 0 & \text{if } E(T\_j) = 0 \\ 1 & \text{if } E(T\_j) = 1 \end{cases} \tag{3}$$

the watermark information is embedded into every 8×8 block of residual bit-planes in the Enhancement Layer, while encoding to MPEG-4 FGS video stream. The watermark bit is modulated by modifying a specified bit that is selected from each 8×8 bit-plane such that the even/odd value of the total number of "1" bits can meet the corresponding watermark information. The main reasons for hiding watermark into enhancement layers is that minimal degradation of the host data can be imperceptible as the watermark signal is

Fig. 6. Embedding a watermark in an Enhancement Layer of the MPEG-4 FGS video stream

In turn, in *Figure 7* is presented a block diagram for the watermark extraction from the Enhancement Layer of MPEG-4 FGS video stream (Wang et al., 2006). If ( ) *E Tj* is "1", the extracted watermarking data is equal to "1". Otherwise, if ( ) *ET j* is "0", the extracted watermarking data is also "0". The equation for extracting watermark can be expressed as

> 0, ( ) 0 1, ( ) 1

*<sup>w</sup> if E T* <sup>3</sup> <sup>6</sup> <sup>4</sup> <sup>65</sup>

*j*

(3)

*if E T*

*j*

'

*i*

inserted into the enhancement layer.

(Wang et al., 2006).

follows:

where ( 0,1,2,3,4,...) *w i <sup>i</sup>* is the *i*-th data of watermark. Also, in the watermark extracting of (Wang et al., 2006), the received Enhancement Layer (EL) stream with the watermarking data can be decoded to bit-planes through the Variable-Length Decoding (VLD) at the receiver end.

Fig. 7. Extracting a watermark from an Enhancement Layer of the MPEG-4 Fine Granularity Scalability (FGS) video stream (Wang et al., 2006).

In the following *Section 4*, we discuss compressed-domain watermarking by using Scalable Video Coding techniques.

#### **4. Compressed-domain watermarking by using scalable video coding**

The concept of scalable watermarking is composed of the expansion of progressive coding and the watermark system (Seo & Park, 2005). Progressive watermarking techniques enables to transmit images with a built-in watermark progressively, and then to extract the watermark from the decoded images. The scalable digital watermarking is mostly related to the scalable video coding techniques. Therefore, the scalable digital watermarking enables to protect contents regardless of the transmission of a specific domain, and enables to extract watermark from any domain of the scalable contents. Also, the increase of the scalable domain can also reduce an error of the watermark extraction (Piper et al., 2004). In *Figure 8*, the compression is performed on the original image after the wavelet transform, and the selected coefficients and watermark key are combined, followed by the spectrum quantization and encoding (Seo & Park, 2005). Therefore, by progressively transmitting the

Recent Advances in Watermarking for Scalable Video Coding 13

The scalable transmission method over the broadcasting environment for layered content protection is adopted by (Chang et al., 2004; Chang et al., 2005). As a result, the embedded watermark can be extracted with the high confidence and the next-layer keys/secrets can be perfectly decrypted and reconstructed. The watermarking is added to order to aid the encryption process, since the watermarked data content can withstand different types of

Further, (Park & Shin, 2008) presents a combined scheme of encryption and watermarking to provide the access right and the authentification of the video simultaneously, as schematically presented in *Figure 9*. The proposed scheme enables to protect the data content in a more secure way since the encrypted content is decrypted when the watermark is exactly detected. The encryption is performed for the access right, and the watermarking is implemented for the authentication. Particulalry, the encryption is preformed by encrypting the intra-prediction modes of the 4x4 luma block , the sign bits of texture, and the sign bits of MV difference values in the intra frames and the inter frames. In turn, a reversible watermarking scheme is implemented by using intra-prediction modes. The watermarking scheme proposed by (Park & Shin, 2008) has a small bit-overhead; however,

Fig. 9. Combined scheme of encryption and watermarking (Park & Shin, 2008).

preserved in the encrypted domain.

no degradation of the visual quality occurs.

attacks, such as distortions, image/video processing, and the like.

almost the entire NALU payload. As the NALU structure is preserved, scalability is

image from the low frequency band to the high frequency band, the receiver can extract the watermark from the corresponding image portion, which that contains the built-in watermark; the bit error rate is decreased, as the transmitted data of images, with the builtin watermark, is increased (Seo & Park, 2005).

Fig. 8. Scalable watermarking in the compressed domain (Seo & Park, 2005).

In the following *Section 5*, we discuss combined schemes of watermarking and encryption by using the H.264/SVC.

#### **5. Combined schemes of watermarking and encryption by using Scalable Video Coding**

Intellectual Property (IP) protection is a critical element in a multimedia transmission system (Chang et al., 2004; Chang et al., 2005). Conventional IP protection schemes can be categorized into two major branches: *encryption* and *watermarking*. The content protection can be increased when combining the encryption and the robust watermarking, as proposed and implemented by (Chang et al., 2004; Chang et al., 2005). By taking advantage of the nature of cryptographic schemes and digital watermarking, the copyright of multimedia contents can be well protected.

In general, the Scalable Video Coding encryption can be defined as follows (Stutz & Uhl, 2011):


image from the low frequency band to the high frequency band, the receiver can extract the watermark from the corresponding image portion, which that contains the built-in watermark; the bit error rate is decreased, as the transmitted data of images, with the built-

Fig. 8. Scalable watermarking in the compressed domain (Seo & Park, 2005).

In the following *Section 5*, we discuss combined schemes of watermarking and encryption

Intellectual Property (IP) protection is a critical element in a multimedia transmission system (Chang et al., 2004; Chang et al., 2005). Conventional IP protection schemes can be categorized into two major branches: *encryption* and *watermarking*. The content protection can be increased when combining the encryption and the robust watermarking, as proposed and implemented by (Chang et al., 2004; Chang et al., 2005). By taking advantage of the nature of cryptographic schemes and digital watermarking, the copyright of multimedia

In general, the Scalable Video Coding encryption can be defined as follows (Stutz &

& Encryption before compression: There are no dedicated encryption proposals that take

& Compression/Integrated encryption: The base layer is encoded similar to AVC, thus all encryption schemes for AVC can be basically employed in the base layer. The enhancement layers can employ inter-layer prediction, but not necessarily have to, e.g., if inter-layer prediction does not result in better compression. The compression integrated encryption approaches for AVC can be applied as well for SVC, e.g., the

& Bitstream/ Oriented encryption: The approach of (Stutz & Uhl, 2008) takes advantage of SVC to implement transparent encryption after compression. The following approaches have been proposed for SVC encryption (Arachchi et al., 2009; Hellwagner et al., 2009; Nithin et al., 2009) which all preserve the NALU structure and encrypt

approaches targeting the coefficient data can also be applied for SVC.

**5. Combined schemes of watermarking and encryption by using Scalable** 

in watermark, is increased (Seo & Park, 2005).

by using the H.264/SVC.

contents can be well protected.

SVC-specifics into account (Stutz & Uhl, 2011).

**Video Coding** 

Uhl, 2011):

almost the entire NALU payload. As the NALU structure is preserved, scalability is preserved in the encrypted domain.

The scalable transmission method over the broadcasting environment for layered content protection is adopted by (Chang et al., 2004; Chang et al., 2005). As a result, the embedded watermark can be extracted with the high confidence and the next-layer keys/secrets can be perfectly decrypted and reconstructed. The watermarking is added to order to aid the encryption process, since the watermarked data content can withstand different types of attacks, such as distortions, image/video processing, and the like.

Further, (Park & Shin, 2008) presents a combined scheme of encryption and watermarking to provide the access right and the authentification of the video simultaneously, as schematically presented in *Figure 9*. The proposed scheme enables to protect the data content in a more secure way since the encrypted content is decrypted when the watermark is exactly detected. The encryption is performed for the access right, and the watermarking is implemented for the authentication. Particulalry, the encryption is preformed by encrypting the intra-prediction modes of the 4x4 luma block , the sign bits of texture, and the sign bits of MV difference values in the intra frames and the inter frames. In turn, a reversible watermarking scheme is implemented by using intra-prediction modes. The watermarking scheme proposed by (Park & Shin, 2008) has a small bit-overhead; however, no degradation of the visual quality occurs.

Fig. 9. Combined scheme of encryption and watermarking (Park & Shin, 2008).

Recent Advances in Watermarking for Scalable Video Coding 15

Chang, F.-C., Huang, H.-C. & Hang, H.-M. (2004). Combined encryption and watermarking

Chang, F.-C., Huang, H.-C. & Hang, H.-M. (2005). Layered access control schemes on

Angelides and H. Agius), John Wiley & Sons, Ltd, Chichester, UK, 2010. Grois, D.; Kaminsky, E. & Hadar, O., (2010). ROI adaptive scalable video coding for limited bandwidth wireless networks, *Wireless Days (WD), 2010 IFIP*, pp.1-5, 20-22 Oct. 2010. Grois, D.; Kaminsky, E. & Hadar, O. (2010). Adaptive bit-rate control for Region-of-Interest

Grois, D. & Hadar, O. (2011). Complexity-aware adaptive bit-rate control with dynamic ROI

Hellwagner, H., Kuschnig, R., Stutz, T. & Uhl, A. (2009). Efficient in-network adaptation of

ITU-T and ISO/IEC JTC 1 (1994). Generic coding of moving pictures and associated audio

ITU-T (2000). Video coding for low bit rate communication, ITU-T Recommendation H.263,

ISO/IEC JTC 1 (2004). Coding of audio-visual objects – Part 2: Visual, ISO/IEC 14492-2 (MPEG-4 Visual), version 1: Apr. 1999, version 2: Feb. 2000, version 3: May 2004. Kaminsky, E.; Grois, D. & Hadar, O. (2008). Dynamic Computational Complexity and Bit

Lin, E., Podilchuk, C. & Kalker, T. (2001). Streaming video and rate scalable compression:

Meerwald, P. & Uhl, A. (2008). Toward robust watermarking of scalable video, *In*

version 1: Nov. 1995, version 2: Jan. 1998, version 3: Nov. 2000.

*International Symposium on* , pp. 4983- 4986, vol. 5, 23-26 May 2005. Cox, I.J., Miller, M.L. & Bloom, J.A. (2002), Digital Watermarking, Morgan Kaufmann, 2002. Grois, D.; Kaminsky, E. & Hadar, O. (2010). Optimization Methods for H.264/AVC Video

*IEEE 26th Convention of* , pp.761-765, 17-20 Nov. 2010.

*International Conference on* , pp.1-4, 11-15 Jul. 2011.

*R*., Elsevier, vol. 19, iss. 1, pp. 56-74, Jan. 2008.

*Watermarking of Multimedia Content III*, pp. 116–127, 2001.

Thesis, University of Salzburg, Austria, Feb. 2011.

iss. 24, vol. 9, pp. 740 – 758, Jul. 2009.

(MPEG-2 Video), Nov. 1994.

Germany, Oct. 15 - 17, 2010.

*(PCM2004)*, pp. 356–363, Dec. 2004.

approaches for scalable multimedia coding, *Pacific-Rim Conf. on Multimedia* 

watermarked scalable media, *Circuits and Systems, 2005. ISCAS 2005. IEEE* 

Coding, The Handbook of MPEG Applications: Standards in Practice, (eds M. C.

Scalable Video Coding, *Electrical and Electronics Engineers in Israel (IEEEI), 2010* 

pre-processing for scalable video coding, *Multimedia and Expo (ICME), 2011 IEEE* 

encrypted H.264/SVC content, *Journal on Signal Processing: Image Communication*,

information – Part 2: Video, ITU-T Recommendation H.262 and ISO/IEC 13818-2

Allocation for Optimizing H.264/AVC Video Compression, *J. Vis. Commun. Image* 

what are the challenges for watermarking? *In Proceedings of SPIE 4314, Security and* 

*Proceedings of SPIE, Security, Forensics, Steganography, and Watermarking of Multimedia Contents*, pp. 68190J ff., San Jose, CA, USA, 6819, Jan. 27 - 31, 2008 Meerwald, P. & Uhl, A. (2010). Robust watermarking of H.264/SVC-encoded video: quality

and resolution scalability, In H.-J. Kim, Y. Shi, M. Barni, editors, *In Proceedings of the 9th International Workshop on Digital Watermarking, IWDW '10*, pp. 159-169, Seoul, Korea, Lecture Notes in Computer Science, 6526, Springer, October 1 - 3, 2010. Meerwald, P. & Uhl, A. (2010). Robust watermarking of H.264-encoded video: Extension to

SVC, *In Proceedings of the Sixth International Conference on Intelligent Information Hiding and Multimedia Signal Processing*, IIH-MSP '10, pp. 82-85, Darmstadt,

Meerwald, P. (2011). Digital Watermark Detection in Visual Multimedia Content, Ph.D.

The method of (Park & Shin, 2008) is applied in the Scalable Video Coding on the macroblock (MB) level in the Base Layer. The encryption and watermarking are implemented in the encoding process almost simultaneously. In turn, in the decoding process, the receiver's device extracts the watermark from the received bitstream. The extracted watermark is compared to the original one. If they match, then the received video s trusted and the encrypted bitsream is decrypted. In other words, according to (Park & Shin, 2008), only authenticated contents can be decoded in the decoding process.

In the following *Section 6*, we present possible future research directions for optimizing the existing watermarking techniques for use with the Scalable Video Coding.

### **6. Future research directions**

The existing watermarking techniques for the Scalable Video Coding have still many issues to be solved in order to provide a complete solution, and possible future research directions can be outlined as follows (Bhowmik, 2010):


## **7. Conclusions**

In this chapter we have presented a comprehensive overview of recent developments in the area of watermarking by using the Scalable Video Coding. As discussed, the Scalable Video Coding poses new challenges for watermarking, which have to be addressed to achieve full protection of the scalable content, while maintaining low bit-rate overhead due to watermarking. Particularly, we presented recent advances in robust watermarking and discussed recent advances in the scalable fragile watermarking; also, we presented recent compressed-domain watermarking techniques by using the Scalable Video Coding*,* and presented combined schemes of the SVC watermarking and encryption.

As clearly seen from this overview, there are still many challenges to be solved, and therefore further research in this field should be carried out.

#### **8. References**


The method of (Park & Shin, 2008) is applied in the Scalable Video Coding on the macroblock (MB) level in the Base Layer. The encryption and watermarking are implemented in the encoding process almost simultaneously. In turn, in the decoding process, the receiver's device extracts the watermark from the received bitstream. The extracted watermark is compared to the original one. If they match, then the received video s trusted and the encrypted bitsream is decrypted. In other words, according to (Park &

In the following *Section 6*, we present possible future research directions for optimizing the

The existing watermarking techniques for the Scalable Video Coding have still many issues to be solved in order to provide a complete solution, and possible future research directions

& Developing watermarking techniques for the Region-of-Interest (ROI) video coding by

& Modeling the transmission channel error and its influence on the watermark robustness

& Developing real-time watermarking authentication schemes by using bitstream-domain

& Developing comprehensive compressed-domain SVC watermarking schemes, which enable scalability in the media distribution, while resolving digital rights management

In this chapter we have presented a comprehensive overview of recent developments in the area of watermarking by using the Scalable Video Coding. As discussed, the Scalable Video Coding poses new challenges for watermarking, which have to be addressed to achieve full protection of the scalable content, while maintaining low bit-rate overhead due to watermarking. Particularly, we presented recent advances in robust watermarking and discussed recent advances in the scalable fragile watermarking; also, we presented recent compressed-domain watermarking techniques by using the Scalable Video Coding*,* and

As clearly seen from this overview, there are still many challenges to be solved, and

Arachchi, H. K., Perramon, X., Dogan, S. & Kondoz, A.M. (2009). Adaptation-aware encryption

of scalable H.264/AVC video for content security, *Scalable Coded Media beyond Compression, Signal Processing: Image Communication*, iss. 24, vol. 6, pp. 468–483, 2009. Bhowmik, D. (2010). Robust Watermarking Techniques for Scalable Coded Image and

Video, Ph.D. Thesis, Department of Electronic and Electrical Engineering,

presented combined schemes of the SVC watermarking and encryption.

therefore further research in this field should be carried out.

University of Sheffield, 2010.

Shin, 2008), only authenticated contents can be decoded in the decoding process.

existing watermarking techniques for use with the Scalable Video Coding.

**6. Future research directions** 

using the H.264/SVC;

for SVC applications;

(DRM) issues.

**7. Conclusions** 

**8. References** 

can be outlined as follows (Bhowmik, 2010):

watermarking for the H.264/SVC;


1. Introduction

0

**2**

<sup>1</sup>*France* <sup>2</sup>*Morocco*

Perceptual Image Hashing

and Abdellah Ait Ouahman<sup>2</sup>

<sup>2</sup>*University of Cadi Ayyad, ETRI Team*

Azhar Hadmi1, William Puech1, Brahim Ait Es Said2

With the fast advancement of computer, multimedia and network technologies, the amount of multimedia information that is conveyed, broadcast or browsed via digital devices has grown exponentially. Simultaneously, digital forgery and unauthorized use have reached a significant level that makes multimedia authentication and security very challenging and demanding. The ability to detect changes in multimedia data has been very important for many applications, especially for journalistic photography, medical or artwork image databases. This has spurred interest in developing more robust algorithms and techniques to allow to check safety of exchanged multimedia data confidentiality, authenticity and integrity. Confidentiality means that the exchange between encrypted multimedia data entities, which without decryption key, is unintelligible. Confidentiality is achieved mainly through encryption schemes, either secret key or public key. Authentication is an another crucial issue of multimedia data protection, it makes possible to trace the author of the multimedia data and allow to determine if an original multimedia data content was altered in any way from the time of its recording. Integrity allows degradation detection of multimedia and helps make sure that the received multimedia data has not been modified by a third party for malicious reasons. Many attempts have been noted to secure multimedia data from illegal use by different techniques fields such as encryption field, watermarking field and perceptual image hashing field. The field of encryption is becoming very important in the present era in which information security is of the utmost concern to provide end-to-end security. Multimedia data encryption has applications in internet communication, multimedia systems, medical imaging, telemedicine, military communication, etc. Although we may use the traditional cryptosystems to encrypt multimedia data directly, it is not a good idea for two reasons. The first reason is that the multimedia data size is almost always much great. Therefore, the traditional cryptosystems need much more time to directly encrypt the multimedia data. The other problem is that the decrypted multimedia data must be equal to the original multimedia data. However, this requirement is not necessary for image/video data. Due to the characteristic of human perception, a decrypted multimedia containing small distortion is usually acceptable. Deciding upon what level of security is needed is harder than it looks. To identify an optimal security level, the cost of the multimedia information to be protected and the cost of the protection itself are to be compared carefully. At present, many available image encryption algorithms have been proposed (Ozturk & Ibrahim, 2005;

<sup>1</sup>*University of Montpellier II, CNRS UMR 5506-LIRMM*


## Perceptual Image Hashing

Azhar Hadmi1, William Puech1, Brahim Ait Es Said2 and Abdellah Ait Ouahman<sup>2</sup> <sup>1</sup>*University of Montpellier II, CNRS UMR 5506-LIRMM* <sup>2</sup>*University of Cadi Ayyad, ETRI Team* <sup>1</sup>*France* <sup>2</sup>*Morocco*

#### 1. Introduction

16 Watermarking – Volume 2

Nithin, T., Bull, D. & Redmill, D. (2009). A novel H.264 SVC encryption scheme for secure

Park, S. & Shin, S. (2008). Combined scheme of encryption and watermarking in

Piper, A., Safavi-Naini, R. & Mertins, A. (2004). Coefficient selection methods for scalable

Piper, A., Safavi-Naini, R. & Mertins, A. (2005). Resolution and quality scalable spread

Piper, A. (2010). Scalable Watermarking for Images, Ph.D. Thesis, School of Computer Science and Software Engineering, University of Wollongong, 2010. Schierl, T., Hellge, C., Mirta, S., Gruneberg, K. & Wiegand, T. (2007). Using H.264/AVC-

Schwarz, H.; Marpe, D. & Wiegand, T. (2007). Overview of the scalable video coding

Seo, J. & Park, H. (2005). Data protection of multimedia contents using scalable digital

Shi, F., Liu, S., Yao, H., Liu, Y. & Zhang, S. (2010). Scalable and Credible Video

Stutz, T. & Uhl, A. (2008). Format-compliant encryption of H.264/AVC and SVC, *In*

Stutz, T. & Uhl, A. (2011), Survey of H.264 AVC/SVC Encryption, *Circuits and Systems for* 

Wang, C., Lin, Y., Yi, S. & Chen, P. (2006). Digital authentication and verification in MPEG-4

Wiegand, T. & Sullivan, G. (2003). Final draft ITU-T recommendation and final draft

JTC1/SC29/WG1, Annex C, Pattaya, Thailand, Mar. 2003, Doc. JVT-G050. Wiegand, T.; Schwarz, H.; Joch, A.; Kossentini, F. & Sullivan, G. J. (2003). Rate-constrained

Wiegand, T.; Sullivan, G.; Reichel, J.; Schwarz, H. & Wien, M. (2006). Joint draft 8 of SVC

*Video Technol.*, vol. 13, iss. 7, pp. 688- 703, Jul. 2003.

Meeting, Hangzhou, China, Oct. 2006.

*Video Technology, IEEE Transactions on*, vol.PP, no.99, pp. 1-15, 2011.

spread spectrum watermarking, *IWDW 2003*, pp. 235-246, 2004.

*Security, MMSEC '05*, pp. 79–90, New York, NY, USA, Aug. 2005.

Chicago, IL, USA, May 2009.

pp. 3455-3458, 27-30 May 2007.

Berkeley, CA, USA, Dec. 2008.

Vegas, NV, Jun. 2006.

17, no. 9, pp. 1103–1120, Sept. 2007.

*International Conference on* , pp. 376- 380, 2005.

361, Sep. 2008.

708, 2010.

bit-rate transcoding. *In Proceedings of the Picture Coding Symposium, PCS'09*,

H.264/Scalable Video Coding (SVC). In New Directions in Intelligent Interactive Multimedia, Springer, Studies in Computational Intelligence, vol. 142, pp. 351—

spectrum image watermarking, *In Proceedings of the 7th Workshop on Multimedia and* 

based Scalable Video Coding (SVC) for Real Time Streaming in Wireless IP Networks, *Circuits and Systems, 2007. ISCAS 2007. IEEE International Symposium on*,

extension of the H.264/AVC standard, *IEEE Trans. Circ. Syst. for Video Technol.*, vol.

watermarking, *Computer and Information Science, 2005. Fourth Annual ACIS* 

Watermarking towards Scalable Video Coding, *Advances in Multimedia Information Processing, PCM 2010*, Lecture Notes in Computer Science, vol. 6297/2010, pp. 697-

*Proceedings of the Eighth IEEE International Symposium on Multimedia (ISM'08)*,

fine-granular scalability video using bit-plane watermarking, *Proc. of Conference on Image Processing, Computer Vision and Pattern Recognition (IPCV'06)*, pp. 16–21, Las

international standard of joint video specification (ITU-T Rec. H.264 ISO/IEC 14 496-10 AVC), in Joint Video Team (JVT) of ITU-T SG16/Q15 (VCEG) and ISO/IEC

coder control and comparison of video coding standards, *IEEE Trans. Circuit Syst.* 

amendment, ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6 9 (JVT-U201), 21st

With the fast advancement of computer, multimedia and network technologies, the amount of multimedia information that is conveyed, broadcast or browsed via digital devices has grown exponentially. Simultaneously, digital forgery and unauthorized use have reached a significant level that makes multimedia authentication and security very challenging and demanding. The ability to detect changes in multimedia data has been very important for many applications, especially for journalistic photography, medical or artwork image databases. This has spurred interest in developing more robust algorithms and techniques to allow to check safety of exchanged multimedia data confidentiality, authenticity and integrity. Confidentiality means that the exchange between encrypted multimedia data entities, which without decryption key, is unintelligible. Confidentiality is achieved mainly through encryption schemes, either secret key or public key. Authentication is an another crucial issue of multimedia data protection, it makes possible to trace the author of the multimedia data and allow to determine if an original multimedia data content was altered in any way from the time of its recording. Integrity allows degradation detection of multimedia and helps make sure that the received multimedia data has not been modified by a third party for malicious reasons. Many attempts have been noted to secure multimedia data from illegal use by different techniques fields such as encryption field, watermarking field and perceptual image hashing field. The field of encryption is becoming very important in the present era in which information security is of the utmost concern to provide end-to-end security. Multimedia data encryption has applications in internet communication, multimedia systems, medical imaging, telemedicine, military communication, etc. Although we may use the traditional cryptosystems to encrypt multimedia data directly, it is not a good idea for two reasons. The first reason is that the multimedia data size is almost always much great. Therefore, the traditional cryptosystems need much more time to directly encrypt the multimedia data. The other problem is that the decrypted multimedia data must be equal to the original multimedia data. However, this requirement is not necessary for image/video data. Due to the characteristic of human perception, a decrypted multimedia containing small distortion is usually acceptable. Deciding upon what level of security is needed is harder than it looks. To identify an optimal security level, the cost of the multimedia information to be protected and the cost of the protection itself are to be compared carefully. At present, many available image encryption algorithms have been proposed (Ozturk & Ibrahim, 2005;

important to enhance robustness properties and increase randomness to minimize collision probabilities in a perceptual image hashing system. This step is very difficult especially if it is followed by the *Compression and Encryption* stage because we do not know the behavior of the extracted continuous features after content-preserving/content-changing manipulations (manipulations examples are given in Table 1). For this reason, in most proposed perceptual

Perceptual Image Hashing 19

image hashing schemes, the *Compression and Encryption* stage is ignored.







Table 1. Content-preserving and content-changing manipulations.

Content-preserving manipulations Content-changing manipulations





In this chapter we analyze the importance of the *Quantization* stage problem in a perceptual image hashing pipeline. This chapter is arranged as follows. In Section 2, a classification of perceptual image hashing methods is presented followed by an overview of the unifying framework for perceptual image hashing. Then, the basic metrics and important requirements of a perceptual image hashing function wherein a formulation of the perceptual image hashing problem is given. Then, perceptual hash verification measures are presented followed by an overview of recent published schemes proposed in the literature. In Section 3, we present the quantization problem in perceptual image hashing systems, then we discuss the different quantization techniques used for more robustness of a perceptual image hashing scheme where we show their advantages and their limitations. In Section 4, a new approach of analysis of the quantization stage is presented based on the theoretical study presented in Section 3 and it is followed by a presentation and discussion of some obtained experimental results. Finally, Section 5 offers a discussion on the issues addressed and identifies future

their positions

textures, structure, etc.

time or location

manipulations etc.

Puech et al., 2007; Rodrigues et al., 2006). In some algorithms, the secret-key and algorithm cannot be separated effectively. This does not satisfy the requirements of the modern cryptographic mechanism and are prone to various attacks. In recent years, the image encryption has been developed to overcome above disadvantages as discussed in (Furht et al., 2004; Stinson, 2002). The other field to secure multimedia data is the watermarking field. Watermarking schemes have been developed for protecting intellectual property rights, which embed imperceptible signal, called watermark, carrying copyright information into a multimedia data *i.e.* image to form the watermarked image. The embedded watermark should be robust against malicious attacks so that it can be correctly extracted to show the ownership of the host multimedia data whenever necessary (Bender et al., 1996; Memon & Wong, 1998). A fragile or semi-fragile watermark detects changes of the host multimedia data such that it can provide some form of guarantee that the multimedia data has not been tampered with and is originated from the right source. In addition, a fragile watermarking scheme should be able to identify which portions of the watermarked multimedia data are authentic and which are corrupted; if unauthenticated portions are detected, it should be able to restore it (Cox et al., 2002). Watermarking has been widely adopted in many applications that require copyright protection, copy control, image authentication and broadcast monitoring (Cox et al., 2000). Watermarking can be used in copyright check or content authentication for individual images, but is not suitable when a large scale search is required. Furthermore, data embedding inevitably cause slight distortion to the host multimedia data (Wang & Zhang, 2007) and change its content. Recently, researchers in the field of security/authentication of multimedia data have introduced a technique inspired from the cryptographic hash functions to authenticate multimedia data called the *Perceptual hash functions* or *Perceptual image hashing* in case of image applications. It should be noted that the objective of a cryptographic hash function and a perceptual image hash function are not exactly the same. For example, there is no robustness or tamper localization requirement in case of a cryptographic hash function (Ahmed & Siyal, 2006). Traditionally, data integrity issues are addressed by cryptographic hashes or message authentication functions, such as MD5 (Rivest, 1992) and SHA series (NIST, 2008), which are sensitive to every bits of the input message. As a result, the message integrity can be validated when every bit of the message are unchanged (Menezes et al., 1996). This sensitivity to every bit is not suitable for multimedia data, since the information it carries is mostly retained even when the multimedia has undergone various content preserving operations. Therefore, bit-by-bit verification is no longer a suitable method for multimedia data authentication. A rough classification of content-preserving and content-changing manipulations is given in Table 1 (Han & Chu, 2010). Robust perceptual image hashing methods have recently been proposed as primitives to overcome the above problems and have constituted the core of a challenging developing research area to academia as well as the multimedia industry. Perceptual Image hashing functions extract certain features from image and calculate a hash value based on these features. Such functions have been proposed to establish the "perceptual equality" of image content. Image authentication is performed by comparing the hash values of the original image and the image to be authenticated. Perceptual hashes are expected to be able to survive on acceptable content-preserving manipulations and reject malicious manipulations. In recent years, there has been a growing body of research on perceptual image hashing that is increasingly receiving attention in the literature. Perceptual image hashing system generally consists of four pipeline stages: the *Transformation* stage, the *Feature extraction* stage, the *Quantization* stage and the *Compression and Encryption* stage as shown in Figure 1. The *Quantization* stage in a perceptual image hashing system is very

#### 18 Watermarking – Volume 2 Perceptual Image Hashing <sup>3</sup> Perceptual Image Hashing 19

2 Will-be-set-by-IN-TECH

Puech et al., 2007; Rodrigues et al., 2006). In some algorithms, the secret-key and algorithm cannot be separated effectively. This does not satisfy the requirements of the modern cryptographic mechanism and are prone to various attacks. In recent years, the image encryption has been developed to overcome above disadvantages as discussed in (Furht et al., 2004; Stinson, 2002). The other field to secure multimedia data is the watermarking field. Watermarking schemes have been developed for protecting intellectual property rights, which embed imperceptible signal, called watermark, carrying copyright information into a multimedia data *i.e.* image to form the watermarked image. The embedded watermark should be robust against malicious attacks so that it can be correctly extracted to show the ownership of the host multimedia data whenever necessary (Bender et al., 1996; Memon & Wong, 1998). A fragile or semi-fragile watermark detects changes of the host multimedia data such that it can provide some form of guarantee that the multimedia data has not been tampered with and is originated from the right source. In addition, a fragile watermarking scheme should be able to identify which portions of the watermarked multimedia data are authentic and which are corrupted; if unauthenticated portions are detected, it should be able to restore it (Cox et al., 2002). Watermarking has been widely adopted in many applications that require copyright protection, copy control, image authentication and broadcast monitoring (Cox et al., 2000). Watermarking can be used in copyright check or content authentication for individual images, but is not suitable when a large scale search is required. Furthermore, data embedding inevitably cause slight distortion to the host multimedia data (Wang & Zhang, 2007) and change its content. Recently, researchers in the field of security/authentication of multimedia data have introduced a technique inspired from the cryptographic hash functions to authenticate multimedia data called the *Perceptual hash functions* or *Perceptual image hashing* in case of image applications. It should be noted that the objective of a cryptographic hash function and a perceptual image hash function are not exactly the same. For example, there is no robustness or tamper localization requirement in case of a cryptographic hash function (Ahmed & Siyal, 2006). Traditionally, data integrity issues are addressed by cryptographic hashes or message authentication functions, such as MD5 (Rivest, 1992) and SHA series (NIST, 2008), which are sensitive to every bits of the input message. As a result, the message integrity can be validated when every bit of the message are unchanged (Menezes et al., 1996). This sensitivity to every bit is not suitable for multimedia data, since the information it carries is mostly retained even when the multimedia has undergone various content preserving operations. Therefore, bit-by-bit verification is no longer a suitable method for multimedia data authentication. A rough classification of content-preserving and content-changing manipulations is given in Table 1 (Han & Chu, 2010). Robust perceptual image hashing methods have recently been proposed as primitives to overcome the above problems and have constituted the core of a challenging developing research area to academia as well as the multimedia industry. Perceptual Image hashing functions extract certain features from image and calculate a hash value based on these features. Such functions have been proposed to establish the "perceptual equality" of image content. Image authentication is performed by comparing the hash values of the original image and the image to be authenticated. Perceptual hashes are expected to be able to survive on acceptable content-preserving manipulations and reject malicious manipulations. In recent years, there has been a growing body of research on perceptual image hashing that is increasingly receiving attention in the literature. Perceptual image hashing system generally consists of four pipeline stages: the *Transformation* stage, the *Feature extraction* stage, the *Quantization* stage and the *Compression and Encryption* stage as shown in Figure 1. The *Quantization* stage in a perceptual image hashing system is very

important to enhance robustness properties and increase randomness to minimize collision probabilities in a perceptual image hashing system. This step is very difficult especially if it is followed by the *Compression and Encryption* stage because we do not know the behavior of the extracted continuous features after content-preserving/content-changing manipulations (manipulations examples are given in Table 1). For this reason, in most proposed perceptual image hashing schemes, the *Compression and Encryption* stage is ignored.


Table 1. Content-preserving and content-changing manipulations.

In this chapter we analyze the importance of the *Quantization* stage problem in a perceptual image hashing pipeline. This chapter is arranged as follows. In Section 2, a classification of perceptual image hashing methods is presented followed by an overview of the unifying framework for perceptual image hashing. Then, the basic metrics and important requirements of a perceptual image hashing function wherein a formulation of the perceptual image hashing problem is given. Then, perceptual hash verification measures are presented followed by an overview of recent published schemes proposed in the literature. In Section 3, we present the quantization problem in perceptual image hashing systems, then we discuss the different quantization techniques used for more robustness of a perceptual image hashing scheme where we show their advantages and their limitations. In Section 4, a new approach of analysis of the quantization stage is presented based on the theoretical study presented in Section 3 and it is followed by a presentation and discussion of some obtained experimental results. Finally, Section 5 offers a discussion on the issues addressed and identifies future

Fig. 1. Four pipeline stages of a perceptual image hashing system.

the values of image pixels or its frequency coefficients in the frequency space.

Fig. 2. Selection of the most relevant features in the Feature Extraction stage.

In the next stage, the *Quantization* stage, we get a quantized intermediate perceptual hash vector which contains *L* × *p* elements of type *byte*. Uniform quantization can be applied to quantize each component of the continuous perceptual hash vector. Adaptive

In the *Transformation* stage, the input image of size *M* × *N bytes* undergoes spatial transformations such as color transformation, smoothing, affine transformations, etc. or frequency transformations such as Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), etc. When the DWT transformation is applied, most perceptual image hashing schemes take into account just the LL subband because it is a coarse version of the original image and contains all the perceptually information. The principal aim of those transformations is to make all extracted features, in the *Feature Extraction* stage, depend upon

Perceptual Image Hashing 21

In the *Feature Extraction* stage, the image hashing system extracts the image features from the transformed image to generate the feature vector of *L* features where *L* << *M* × *N*. Note that each feature can contain *p* elements of type *float* which means that we get *L* × *p floats* at this stage. It is still an open question, however, which mappings (if any) from DCT/DWT coefficients preserve the essential information about an image for hashing and/or mark embedding applications. We can at this stage add another features selection as shown in Fig. 2, so only the most pertinent features are selected which are statistically more resistant against a specific allowed manipulation like addition of noise and image rotation, etc. The selected features can be presented as an intermediate hash vector of *K* × *p f loats*, where *K* < *L*.

2.2.1 Transformation stage

2.2.2 Feature Extraction stage

2.2.3 Quantization stage

research directions. The objective of the latter section is to present prospects and challenges in the context of perceptual image hashing.

#### 2. Perceptual image hashing

In this Section, we give a classification of different perceptual image hashing techniques followed by the presentation of perceptual image hashing framework and basic requirements related to perceptual image hashing are discussed. Furthermore, related work is reviewed and the challenging problems that are not yet resolved are identified.

#### 2.1 Perceptual image hashing methods classifi cation

Most of the existing image hashing studies mainly focus on the feature extraction stage and use them during authentication, which can roughly be classified into the four following categories (Zhu et al., 2010), (Han & Chu, 2010):


#### 2.2 Perceptual image hashing framework

A perceptual image hashing system, as shown in Fig. 1, generally consists of four pipeline stages: the *Transformation* stage, the *Feature extraction* stage, the *Quantization* stage and the *Compression and Encryption* stage.

In the *Transformation* stage, the input image undergoes spacial and/or frequency transformation to make all extracted features depend the the values of image pixels or the image frequency coefficients. In the *Feature Extraction* stage, the perceptual image hashing system extracts the image features from the input image to generate the continuous hash vector. Then, the continuous perceptual hash vector is quantized into the discrete hash vector in the *Quantization* stage. The third stage converts the discrete hash vector into the binary perceptual hash string. Finally, the binary perceptual hash string is compressed and encrypted into a short and a final perceptual hash in the *Compression and Encryption* stage (Figure 1).

4 Will-be-set-by-IN-TECH

research directions. The objective of the latter section is to present prospects and challenges in

In this Section, we give a classification of different perceptual image hashing techniques followed by the presentation of perceptual image hashing framework and basic requirements related to perceptual image hashing are discussed. Furthermore, related work is reviewed

Most of the existing image hashing studies mainly focus on the feature extraction stage and use them during authentication, which can roughly be classified into the four following

• *Statistic-based schemes* (Khelifi & Jiang, 2010; Schneider & Chang, 1996; Venkatesan et al., 2000): This group of schemes extracts hash features by calculating the images statistics in the spacial domain, such as mean, variance, higher moments of image blocks and

• *Relation-based schemes* (Lin & Chang, 2001; Lu & Liao, 2003): This category of approaches extracts hash features by making use of some invariant relationships of the coefficients of

• *Coarse-representation-based schemes* (Fridrich & Goljan, 2000; Kozat et al., 2004; Mihçak & R.Venkatesan, 2001; Swaminathan et al., 2006): In this category of methods, the perceptual hashes are calculated by making use of coarse information of the whole image, such as the spatial distribution of significant wavelet coefficients, the low-frequency

• *Low level feature-based schemes* (Bhattacharjee & Kutter, 1998; Monga & Evans, 2006): The hashes are extracted by detecting the salient image feature points. These methods first perform the DCT or DWT transform on the original image, and then directly make use of the coefficients to generate final hash values. However, these hash values are very sensitive to global as well as local distortions that do not cause perceptually significant changes to

A perceptual image hashing system, as shown in Fig. 1, generally consists of four pipeline stages: the *Transformation* stage, the *Feature extraction* stage, the *Quantization* stage and the

In the *Transformation* stage, the input image undergoes spacial and/or frequency transformation to make all extracted features depend the the values of image pixels or the image frequency coefficients. In the *Feature Extraction* stage, the perceptual image hashing system extracts the image features from the input image to generate the continuous hash vector. Then, the continuous perceptual hash vector is quantized into the discrete hash vector in the *Quantization* stage. The third stage converts the discrete hash vector into the binary perceptual hash string. Finally, the binary perceptual hash string is compressed and encrypted into a short and a final perceptual hash in the *Compression and Encryption* stage (Figure 1).

and the challenging problems that are not yet resolved are identified.

discrete cosine transform (DCT) or wavelet transform (DWT).

2.1 Perceptual image hashing methods classifi cation

categories (Zhu et al., 2010), (Han & Chu, 2010):

coefficients of Fourier transform, and so on.

2.2 Perceptual image hashing framework

*Compression and Encryption* stage.

the context of perceptual image hashing.

2. Perceptual image hashing

histogram.

the images.

Fig. 1. Four pipeline stages of a perceptual image hashing system.

#### 2.2.1 Transformation stage

In the *Transformation* stage, the input image of size *M* × *N bytes* undergoes spatial transformations such as color transformation, smoothing, affine transformations, etc. or frequency transformations such as Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), etc. When the DWT transformation is applied, most perceptual image hashing schemes take into account just the LL subband because it is a coarse version of the original image and contains all the perceptually information. The principal aim of those transformations is to make all extracted features, in the *Feature Extraction* stage, depend upon the values of image pixels or its frequency coefficients in the frequency space.

#### 2.2.2 Feature Extraction stage

In the *Feature Extraction* stage, the image hashing system extracts the image features from the transformed image to generate the feature vector of *L* features where *L* << *M* × *N*. Note that each feature can contain *p* elements of type *float* which means that we get *L* × *p floats* at this stage. It is still an open question, however, which mappings (if any) from DCT/DWT coefficients preserve the essential information about an image for hashing and/or mark embedding applications. We can at this stage add another features selection as shown in Fig. 2, so only the most pertinent features are selected which are statistically more resistant against a specific allowed manipulation like addition of noise and image rotation, etc. The selected features can be presented as an intermediate hash vector of *K* × *p f loats*, where *K* < *L*.

Fig. 2. Selection of the most relevant features in the Feature Extraction stage.

#### 2.2.3 Quantization stage

In the next stage, the *Quantization* stage, we get a quantized intermediate perceptual hash vector which contains *L* × *p* elements of type *byte*. Uniform quantization can be applied to quantize each component of the continuous perceptual hash vector. Adaptive

To meet property in equation (3), most perceptual hash functions try to extract features of images which are invariant under insignificant global modifications such as compression or enhancement. Equation (4) means that, given an image *I*, it should be nearly impossible for an adversary to construct a perceptually different image *Idiff* such that *H*(*I*) = *H*(*Idiff*). This property can be hard to achieve because the features used by published perceptual hash functions are publicly known (Kerckhoffs, 1883; Mihçak & R.Venkatesan, 2001). Also, it makes property in equation (3) be neglected in favor of property in equation (4). Likewise for perfect unpredictability, an equal distribution (equation (1)) of the hash values is needed. This would deter achieving the property in equation (3) (Monga, 2005). Depending on the application, perceptual hash functions have to achieve these conflicting properties to some extent and/or facilitate trade-offs. From a practical point of view, both robustness and security are important. Lack of robustness (equation (3)) renders an image hash useless as explained above, while security (equations (1),(4)) means that it is extremely difficult for an adversary to modify the essential content of an image yet keep the hash value unchanged. Thus, trade-offs must be sought, and this usually forms the central issue of perceptual image hashing research.

Perceptual Image Hashing 23

Perceptual image hashing system calculates hashes for similar images that must be equal. Referring to the image space as shown in Figure 3, let *I* denote an image, and *X* denote the set of images *Iident* that are modified from *I* by means of content-preserving manipulations and are defined to be perceptually similar to *I*. Let *Y* contains all other images *Idiff* that are irrelevant to *I* and its perceptually similar versions. *Idiff* are the results of content-changing manipulations. Consequently, {*I*} ∪ *X* ∪ *Y* forms an entire image space. Let *h*, *hident* and *hdiff* denote hash values of the original image *I*, the perceptually similar image *Iident* from *I* and the perceptually different image *Idiff* from *I* respectively. In robust and secure perceptual image, the following properties are required when Encryption and Compression stage is applied in a perceptual image hashing system: h = h*ident* **for all identical images** I*ident* ∈ X and h = h*ident* **for all different images** I*diff* ∈ Y (Figure 3). Since the requirement of bit-by-bit hashes equality is usually hard to achieve, most of the proposed schemes compute distances and similarities between perceptual hashes. The most often used are the Bit Error Rate (BER), the Hamming distance and the Peak of Cross Correlation (PCC). The first two measure the distance between two hash values, whereas the latter measures the similarity between two hash values. Using theses measures, the sender determines the threshold • . The proper selection of • is very important as it defines the boundary between content-preserving and

Let *d*(., .) indicates the used measure *i.e.* a normalized Hamming distance function. Let *h*, *hident* and *hdiff* denote hash values of the original image *I*, the perceptually similar image *Iident* from *I* and the perceptually different image *Idiff* from *I* respectively. The error-resilience of multimedia data hashing is defined as follows. *Iident* is successfully identified to be perceptually similar to *I* if *d*(*h*, *hident*) • holds. In other words if two images are perceptually similar, their corresponding hashes need to be highly correlated. If *d*(*h*, *hdiff*) • , then *Idiff* is identified as modified from *I* by means of content-changing manipulations. Overall, the main theme of perceptual image hashing is to develop a robust perceptual image hash function that can identify perceptually similar multimedia contents and reject

2.4 Perceptual hash verifi cation

content-changing manipulations.

content-changing manipulations.

quantization (Mihçak & R.Venkatesan, 2001) is another quantization type which is is the most famous quantization scheme in the field of image hashing. The difference between the two quantization schemes is that the partition of uniform quantization is based on the interval length of the hash values, whereas the partition of adaptive quantization is based on the probability density function (pdf) of the hash values. This kind of quantization is detailed in Section 3.

#### 2.2.4 Compression and Encryption stage

*Compression and Encryption* stage is the final step of a perceptual image hashing system, the binary intermediate perceptual hash string is compressed and encrypted into a short perceptual hash of fixed size of *l bytes*, where *l* << *L* × *p*, which presents the final perceptual hash that allows image verification and authentication at the receiver. This stage can be ensured by cryptographic hash functions i.e. SHA series which generate the final hash of fixed size (hash of 160 bits in case SHA-1).

In the next section, we give the most important requirements that a perceptual image hashing must achieve and show how they conflict with each other.

#### 2.3 Metrics and important requirements of a perceptual image hashing

Perceptual hash functions can be categorized into two categories: unkeyed perceptual hash functions and keyed perceptual hash functions. An unkeyed perceptual hash function *H*(*x*) generates a hash value *h* from an arbitrary input *x* (that is *h* = *H*(*x*)). A keyed perceptual hash function generates a hash value *h* from an arbitrary input *x* and a secret key *k* (that is *h* = *H*(*x*; *k*)). The design of efficient robust perceptual image hashing techniques is a very challenging problem that should address the compromise between various conflicting requirements. Let *P* denote probability. Let *H*() denote a perceptual hash function which takes one image as input and produces a binary string of length *l*. Let *I* denote a particular image and *Iident* denote a modified version of this image which is "perceptually similar" to *I*. Let *Idiff* denote an image that is "perceptually different" from *I*. Let *h*<sup>1</sup> and *h*<sup>2</sup> denote hash values of the original image *<sup>I</sup>* and the perceptually different image *Idiff* from *<sup>I</sup>*. {0/1}*<sup>l</sup>* represents binary strings of length *l*. Then the four desirable properties of a perceptual image hashing function are identified as follows:

• Equal distribution (unpredictability) of hash values:

$$P(H(I) = h\_1) \approx \frac{1}{2^{l'}} \forall h\_1 \in \{0, 1\}^l \tag{1}$$

• Pairwise independence for perceptually different images *I* and *Idiff*:

$$P(H(I) = h\_1 | H(I\_{\text{diff}}) = h\_2) \approx P(H(I\_{\text{ident}}) = h\_1), \quad \qquad \forall h\_1, h\_2 \in \{0, 1\}^I \tag{2}$$

• Invariance for perceptually similar images *I* and *Iident*:

$$P(H(I) = H(I\_{\text{ident}})) \ge 1 - \bullet\_1 \tag{3}$$

• Distinction of perceptually different images *I* and *Idiff*:

$$P(H(I) \neq H(I\_{\text{diff}})) \geq 1 - \bullet\_{2\prime} \qquad\qquad\text{for a given } \bullet\_2 \approx 0\tag{4}$$

6 Will-be-set-by-IN-TECH

quantization (Mihçak & R.Venkatesan, 2001) is another quantization type which is is the most famous quantization scheme in the field of image hashing. The difference between the two quantization schemes is that the partition of uniform quantization is based on the interval length of the hash values, whereas the partition of adaptive quantization is based on the probability density function (pdf) of the hash values. This kind of quantization is detailed

*Compression and Encryption* stage is the final step of a perceptual image hashing system, the binary intermediate perceptual hash string is compressed and encrypted into a short perceptual hash of fixed size of *l bytes*, where *l* << *L* × *p*, which presents the final perceptual hash that allows image verification and authentication at the receiver. This stage can be ensured by cryptographic hash functions i.e. SHA series which generate the final hash of

In the next section, we give the most important requirements that a perceptual image hashing

Perceptual hash functions can be categorized into two categories: unkeyed perceptual hash functions and keyed perceptual hash functions. An unkeyed perceptual hash function *H*(*x*) generates a hash value *h* from an arbitrary input *x* (that is *h* = *H*(*x*)). A keyed perceptual hash function generates a hash value *h* from an arbitrary input *x* and a secret key *k* (that is *h* = *H*(*x*; *k*)). The design of efficient robust perceptual image hashing techniques is a very challenging problem that should address the compromise between various conflicting requirements. Let *P* denote probability. Let *H*() denote a perceptual hash function which takes one image as input and produces a binary string of length *l*. Let *I* denote a particular image and *Iident* denote a modified version of this image which is "perceptually similar" to *I*. Let *Idiff* denote an image that is "perceptually different" from *I*. Let *h*<sup>1</sup> and *h*<sup>2</sup> denote hash values of the original image *<sup>I</sup>* and the perceptually different image *Idiff* from *<sup>I</sup>*. {0/1}*<sup>l</sup>* represents binary strings of length *l*. Then the four desirable properties of a perceptual image hashing function

*<sup>P</sup>*(*H*(*I*) = *<sup>h</sup>*1) <sup>≈</sup> <sup>1</sup>

• Pairwise independence for perceptually different images *I* and *Idiff*:

2*l*

*<sup>P</sup>*(*H*(*I*) = *<sup>h</sup>*1|*H*(*Idiff*) = *<sup>h</sup>*2) <sup>≈</sup> *<sup>P</sup>*(*H*(*Iident*) = *<sup>h</sup>*1), <sup>∀</sup>*h*1, *<sup>h</sup>*<sup>2</sup> <sup>∈</sup> {0, 1}*<sup>l</sup>* (2)

*P*(*H*(*I*) = *H*(*Iident*)) ≥ 1 − • 1, *f or a given* • <sup>1</sup> ≈ 0 (3)

*P*(*H*(*I*) = *H*(*Idiff*)) ≥ 1 − • 2, *f or a given* • <sup>2</sup> ≈ 0 (4)

, <sup>∀</sup>*h*<sup>1</sup> <sup>∈</sup> {0, 1}*<sup>l</sup>* (1)

in Section 3.

2.2.4 Compression and Encryption stage

fixed size (hash of 160 bits in case SHA-1).

are identified as follows:

must achieve and show how they conflict with each other.

• Equal distribution (unpredictability) of hash values:

• Invariance for perceptually similar images *I* and *Iident*:

• Distinction of perceptually different images *I* and *Idiff*:

2.3 Metrics and important requirements of a perceptual image hashing

To meet property in equation (3), most perceptual hash functions try to extract features of images which are invariant under insignificant global modifications such as compression or enhancement. Equation (4) means that, given an image *I*, it should be nearly impossible for an adversary to construct a perceptually different image *Idiff* such that *H*(*I*) = *H*(*Idiff*). This property can be hard to achieve because the features used by published perceptual hash functions are publicly known (Kerckhoffs, 1883; Mihçak & R.Venkatesan, 2001). Also, it makes property in equation (3) be neglected in favor of property in equation (4). Likewise for perfect unpredictability, an equal distribution (equation (1)) of the hash values is needed. This would deter achieving the property in equation (3) (Monga, 2005). Depending on the application, perceptual hash functions have to achieve these conflicting properties to some extent and/or facilitate trade-offs. From a practical point of view, both robustness and security are important. Lack of robustness (equation (3)) renders an image hash useless as explained above, while security (equations (1),(4)) means that it is extremely difficult for an adversary to modify the essential content of an image yet keep the hash value unchanged. Thus, trade-offs must be sought, and this usually forms the central issue of perceptual image hashing research.

#### 2.4 Perceptual hash verifi cation

Perceptual image hashing system calculates hashes for similar images that must be equal. Referring to the image space as shown in Figure 3, let *I* denote an image, and *X* denote the set of images *Iident* that are modified from *I* by means of content-preserving manipulations and are defined to be perceptually similar to *I*. Let *Y* contains all other images *Idiff* that are irrelevant to *I* and its perceptually similar versions. *Idiff* are the results of content-changing manipulations. Consequently, {*I*} ∪ *X* ∪ *Y* forms an entire image space. Let *h*, *hident* and *hdiff* denote hash values of the original image *I*, the perceptually similar image *Iident* from *I* and the perceptually different image *Idiff* from *I* respectively. In robust and secure perceptual image, the following properties are required when Encryption and Compression stage is applied in a perceptual image hashing system: h = h*ident* **for all identical images** I*ident* ∈ X and h = h*ident* **for all different images** I*diff* ∈ Y (Figure 3). Since the requirement of bit-by-bit hashes equality is usually hard to achieve, most of the proposed schemes compute distances and similarities between perceptual hashes. The most often used are the Bit Error Rate (BER), the Hamming distance and the Peak of Cross Correlation (PCC). The first two measure the distance between two hash values, whereas the latter measures the similarity between two hash values. Using theses measures, the sender determines the threshold • . The proper selection of • is very important as it defines the boundary between content-preserving and content-changing manipulations.

Let *d*(., .) indicates the used measure *i.e.* a normalized Hamming distance function. Let *h*, *hident* and *hdiff* denote hash values of the original image *I*, the perceptually similar image *Iident* from *I* and the perceptually different image *Idiff* from *I* respectively. The error-resilience of multimedia data hashing is defined as follows. *Iident* is successfully identified to be perceptually similar to *I* if *d*(*h*, *hident*) • holds. In other words if two images are perceptually similar, their corresponding hashes need to be highly correlated. If *d*(*h*, *hdiff*) • , then *Idiff* is identified as modified from *I* by means of content-changing manipulations. Overall, the main theme of perceptual image hashing is to develop a robust perceptual image hash function that can identify perceptually similar multimedia contents and reject content-changing manipulations.

scheme based Radon Transform is proposed in (Lei et al., 2011) where the authors perform Radon Transform on the image and calculate the moment features which are invariant to translation and scaling in the projection space. Then Discrete Fourier Transform (DFT) is applied on the moment features to resist rotation. Finally, the magnitude of the significant DFT coefficients is normalized and quantized as the final perceptual image hash. The proposed method can tolerate almost all the typical image processing manipulations, including JPEG compression, geometric distortion, blur, addition of noise and enhancement. The Radon transform was first used in (Lefebvre et al., 2002), and further expanded in (Seo et al., 2004). Authors in (Guo & Hatzinakos, 2007) propose a perceptual image hashing scheme based on the combination of discrete wavelet transform (DWT) and the Radon Transform. Taking the advantages of the frequency localization property of DWT and shift/rotation invariant property of the Radon transform, the algorithm can effectively detect malicious local changes, and at the same time, be robust against content-preserving modifications. Obtained features derived from the Radon Transform are then quantized by the probabilistic quantization

Perceptual Image Hashing 25

In this Section, we have presented some reviews of different schemes proposed in the field of perceptual image hashing. In Section 3, we develop the quantization problem in perceptual image hashing and we present some approaches to address this problem which surely have

The goal of the quantization stage, in the perceptual image hashing system, is to discretize the continuous intermediate hash vector (continuous features) into a discrete intermediate hash vector (discrete features). This step is very important to enhance robustness properties and increase randomness to minimize collision probabilities of a perceptual image hashing system. Quantization is the conventional way to achieve this goal. The quantization step is difficult because we do not know how the values in the continuous intermediate hash drop after content-preserving (non-malicious) manipulations in each quantization interval *Q*. This difficulty of an efficient quantization increases more when it is followed by an encryption and compression stage *i.e.* SHA-1, because the discrete intermediate hash vectors must be quantized in a correct way for all perceptual similar images. For this reason this stage is ignored in most schemes presented in the literature. To understand the quantization problem statement, let us suppose that the incidental distortion introduced by content-preserving manipulations can be modeled as noise whose maximum absolute magnitude is denoted as *B*, which means that the maximum range of additive noise is *B*. Suppose that the original scalar value *xl* ∈ **R** for *l* ∈ {1, ..., *L*} of the continuous intermediate hash is bounded to a finite interval [−*A*, *A*]. Furthermore, suppose that we wish to obtain a quantized message *q*(*xl*) of *xl* in *P* quantization points given by the set • = {• 1, ...,• *<sup>P</sup>*}. The points are uniformly spaced such that *<sup>Q</sup>* = •*<sup>j</sup>* − •*j*−<sup>1</sup> = <sup>2</sup>*A*/(*<sup>P</sup>* − <sup>1</sup>) for *<sup>j</sup>* ∈ {1, ..., *<sup>P</sup>*}. Now suppose *xl* ∈ [•*j*, •*j*+1), then it will be quantized as •*j*. However, when this value is corrupted after noise addition, the distorted value could drop in the previous quantization interval [•*j*−1, •*j*) or in the next interval [•*j*+1, •*j*+2) and it will be quantized as •*j*−<sup>1</sup> or •*j*+1, respectively, and the quantized *xl* value will not remain unchanged as •*<sup>j</sup>* before and after noise addition. Thus, the noise corruption will cause a different quantization result and automatically cause different perceptual hashes (Hadmi et al., 2010). Figure 4 shows the distribution of the original DWT

(Mihçak & Venkatesan, 2001) to form the final perceptual hash.

3. Quantization problem in perceptual image hashing

limitations in practice.

3.1 Problem statement

Fig. 3. The image space {*I*} ∪ *X* ∪ *Y* formed by an image {*I*}, its perceptually similar versions set *X* and its modified version set *Y*.

2.5 Review of some related work on perceptual image hashing techniques

In recent years, there has been a growing body of research on perceptual image hashing that is increasingly receiving attention in the literature. Most of these existing papers focus on studies of the feature extraction stage because they believe that extracting a set of robust features that resist, and to stay relatively constant, content-preserving manipulations and at the same time should detect content-changing manipulations is the most important objective in perceptual image hashing system. Few papers address perceptual image hashing system security. In (Fridrich, 2000), the extraction of the hash is based on the projection of image coefficients onto filtered pseudo-random patterns. The final perceptual hash is used for generating a pseudo-random watermark sequences, that depend sensitively on a secret key yet continuously on the image, for authentication and integrity verification of still images. In (Venkatesan et al., 2000), a perceptual image hashing technique based on statistics computed from randomized rectangles in the discrete wavelet domain (DWT) is presented. Averages or variances of the rectangles are then calculated and quantized with randomized rounding to obtain the hash in the form of a binary string. The quantized statistics are then sent to an error-correcting decoder to generate the final hash value. Statistical properties of wavelet subbands are generally robust against attacks, but they are only loosely related to the image contents therefore rather insensitive to tampering. This method has been shown to be robust against common image manipulations and geometric attacks. The proposed method in (Schneider & Chang, 1996) is using the intensity histogram to sign the image. Since the global histogram does not contain any spatial information, the authors divide the image into blocks, which can have variable sizes, and compute the intensity histogram for each block separately. This allows some spatial information to be incorporated into the signature. The method in (Fridrich & Goljan, 2000) is based on the observation of the low frequency DCT coefficient. If a low frequency DCT coefficient of an image is small in absolute value, it cannot be made large without causing visible changes to the image. Similarly, if the absolute value of a low frequency coefficient is large, it cannot change it to a small value without influencing the image significantly. To make the procedure dependent on a key, the DCT modes are replaced with DC-free random smooth patterns generated from a secret key. Other researchers have used others techniques to perform image perceptual hashing. Authors in (Swaminathan et al., 2006) used Fourier-Mellin transform for perceptual image hashing applications. Using Fourier-Mellin transform's scale invariant property, the magnitudes of the Fourier transform coefficients were randomly weighted and summed. However, since Fourier transform did not offer localized frequency information, this method was not able to detect malicious local modifications. In a more recent development, a perceptual image hashing 8 Will-be-set-by-IN-TECH

Fig. 3. The image space {*I*} ∪ *X* ∪ *Y* formed by an image {*I*}, its perceptually similar

In recent years, there has been a growing body of research on perceptual image hashing that is increasingly receiving attention in the literature. Most of these existing papers focus on studies of the feature extraction stage because they believe that extracting a set of robust features that resist, and to stay relatively constant, content-preserving manipulations and at the same time should detect content-changing manipulations is the most important objective in perceptual image hashing system. Few papers address perceptual image hashing system security. In (Fridrich, 2000), the extraction of the hash is based on the projection of image coefficients onto filtered pseudo-random patterns. The final perceptual hash is used for generating a pseudo-random watermark sequences, that depend sensitively on a secret key yet continuously on the image, for authentication and integrity verification of still images. In (Venkatesan et al., 2000), a perceptual image hashing technique based on statistics computed from randomized rectangles in the discrete wavelet domain (DWT) is presented. Averages or variances of the rectangles are then calculated and quantized with randomized rounding to obtain the hash in the form of a binary string. The quantized statistics are then sent to an error-correcting decoder to generate the final hash value. Statistical properties of wavelet subbands are generally robust against attacks, but they are only loosely related to the image contents therefore rather insensitive to tampering. This method has been shown to be robust against common image manipulations and geometric attacks. The proposed method in (Schneider & Chang, 1996) is using the intensity histogram to sign the image. Since the global histogram does not contain any spatial information, the authors divide the image into blocks, which can have variable sizes, and compute the intensity histogram for each block separately. This allows some spatial information to be incorporated into the signature. The method in (Fridrich & Goljan, 2000) is based on the observation of the low frequency DCT coefficient. If a low frequency DCT coefficient of an image is small in absolute value, it cannot be made large without causing visible changes to the image. Similarly, if the absolute value of a low frequency coefficient is large, it cannot change it to a small value without influencing the image significantly. To make the procedure dependent on a key, the DCT modes are replaced with DC-free random smooth patterns generated from a secret key. Other researchers have used others techniques to perform image perceptual hashing. Authors in (Swaminathan et al., 2006) used Fourier-Mellin transform for perceptual image hashing applications. Using Fourier-Mellin transform's scale invariant property, the magnitudes of the Fourier transform coefficients were randomly weighted and summed. However, since Fourier transform did not offer localized frequency information, this method was not able to detect malicious local modifications. In a more recent development, a perceptual image hashing

2.5 Review of some related work on perceptual image hashing techniques

versions set *X* and its modified version set *Y*.

scheme based Radon Transform is proposed in (Lei et al., 2011) where the authors perform Radon Transform on the image and calculate the moment features which are invariant to translation and scaling in the projection space. Then Discrete Fourier Transform (DFT) is applied on the moment features to resist rotation. Finally, the magnitude of the significant DFT coefficients is normalized and quantized as the final perceptual image hash. The proposed method can tolerate almost all the typical image processing manipulations, including JPEG compression, geometric distortion, blur, addition of noise and enhancement. The Radon transform was first used in (Lefebvre et al., 2002), and further expanded in (Seo et al., 2004). Authors in (Guo & Hatzinakos, 2007) propose a perceptual image hashing scheme based on the combination of discrete wavelet transform (DWT) and the Radon Transform. Taking the advantages of the frequency localization property of DWT and shift/rotation invariant property of the Radon transform, the algorithm can effectively detect malicious local changes, and at the same time, be robust against content-preserving modifications. Obtained features derived from the Radon Transform are then quantized by the probabilistic quantization (Mihçak & Venkatesan, 2001) to form the final perceptual hash.

In this Section, we have presented some reviews of different schemes proposed in the field of perceptual image hashing. In Section 3, we develop the quantization problem in perceptual image hashing and we present some approaches to address this problem which surely have limitations in practice.

#### 3. Quantization problem in perceptual image hashing

#### 3.1 Problem statement

The goal of the quantization stage, in the perceptual image hashing system, is to discretize the continuous intermediate hash vector (continuous features) into a discrete intermediate hash vector (discrete features). This step is very important to enhance robustness properties and increase randomness to minimize collision probabilities of a perceptual image hashing system. Quantization is the conventional way to achieve this goal. The quantization step is difficult because we do not know how the values in the continuous intermediate hash drop after content-preserving (non-malicious) manipulations in each quantization interval *Q*. This difficulty of an efficient quantization increases more when it is followed by an encryption and compression stage *i.e.* SHA-1, because the discrete intermediate hash vectors must be quantized in a correct way for all perceptual similar images. For this reason this stage is ignored in most schemes presented in the literature. To understand the quantization problem statement, let us suppose that the incidental distortion introduced by content-preserving manipulations can be modeled as noise whose maximum absolute magnitude is denoted as *B*, which means that the maximum range of additive noise is *B*. Suppose that the original scalar value *xl* ∈ **R** for *l* ∈ {1, ..., *L*} of the continuous intermediate hash is bounded to a finite interval [−*A*, *A*]. Furthermore, suppose that we wish to obtain a quantized message *q*(*xl*) of *xl* in *P* quantization points given by the set • = {• 1, ...,• *<sup>P</sup>*}. The points are uniformly spaced such that *<sup>Q</sup>* = •*<sup>j</sup>* − •*j*−<sup>1</sup> = <sup>2</sup>*A*/(*<sup>P</sup>* − <sup>1</sup>) for *<sup>j</sup>* ∈ {1, ..., *<sup>P</sup>*}. Now suppose *xl* ∈ [•*j*, •*j*+1), then it will be quantized as •*j*. However, when this value is corrupted after noise addition, the distorted value could drop in the previous quantization interval [•*j*−1, •*j*) or in the next interval [•*j*+1, •*j*+2) and it will be quantized as •*j*−<sup>1</sup> or •*j*+1, respectively, and the quantized *xl* value will not remain unchanged as •*<sup>j</sup>* before and after noise addition. Thus, the noise corruption will cause a different quantization result and automatically cause different perceptual hashes (Hadmi et al., 2010). Figure 4 shows the distribution of the original DWT

Fig. 5. Illustration on the concept of error correction in Sun's scheme (Sun & Chang, 2005).

Figures 6 and 7.

*<sup>j</sup>* <sup>∈</sup> {1, ..., *<sup>P</sup>*} are designed so that •*<sup>j</sup>*

•*j*−<sup>1</sup> *pX*(*x*) *dx* <sup>=</sup> •*<sup>j</sup>*

[*Aj*, *Bj*] is introduced such that •*<sup>j</sup>*

to make *Cj*

features need to be transmitted or stored beside the image and the final hash as shown in

Perceptual Image Hashing 27

Fig. 6. Hash generation module with quantization in Fawad's scheme (Ahmed et al., 2010).

Fig. 7. Image verification module with quantization in Fawad's scheme (Ahmed et al., 2010).

Another quantization scheme which is widely applied in perceptual image hashing (Swaminathan et al., 2006), (Zhu et al., 2010) proposed by (Mihçak & Venkatesan, 2001) called *Adaptive Quantization* or *Probabilistic Quantization* in (Monga, 2005). Its property is that it takes into account to the distribution of the input data. The quantization intervals *<sup>Q</sup>* = •*<sup>j</sup>* − •*j*−<sup>1</sup> for

levels and *pX*(**.**) is the pdf of the input data *X*. The central points {*Cj*} are defined so as

randomization interval is symmetric around •*<sup>j</sup>* for all *j* in terms of distribution *pX*. The natural constraint must be respected *Cj* ≤ *Aj* and *Bj* ≤ *Cj*+1. The overall quantization rule is then

*pX*(*x*) *dx* <sup>=</sup> *Bj*

*Aj*

•*j*−<sup>1</sup> *pX*(*x*) *dx* <sup>=</sup> 1/*P*, where *<sup>P</sup>* is the number of quantization

•*<sup>j</sup> pX*(*x*) *dx* = *<sup>r</sup>*/*P*, where *<sup>r</sup>* ≤ 1/2. The

*Cj pX*(*x*) *dx* = 1/(2*P*). Around each •*j*, a randomization interval

Fig. 4. The influence of additive Gaussian noise on the quantization (*Q* = 2) of the original DWT LL-subband coefficients and their noisy version in the interval [40, 50]. In green: DWT LL-subband quantized coefficients that dropped from the right neighboring quantization interval. In red: DWT LL-subband quantized coefficients that dropped from the left neighboring quantization interval.

LL-subband (level 3) coefficients, of Lena image sized 1024 × 1024, in the interval [40, 50] and their noisy version, in the same interval [40, 50], by an additive Gaussian noise of standard deviation equals • = 1. When applying a Gaussian noise with • = 1, the noisy image remains visually the same than the original image however it causes changes on extracted features distribution as we can see in Figure 4. This causes errors in the quantization step because the quantized features do not remain unchanged after noise addition as shown in Figure 4. To avoid such cases, many quantization schemes have been proposed in the literature. Authors in (Sun & Chang, 2005) proposes an error correction coding (ECC) to correct errors of extracted features caused by corruption from additive noise to get the same quantization result before and after additive noise. In their work, they assume that the quantization step *Q* > 4*B*, which is not always true at the practical point of view, and they push the points away from the quantization decision boundaries and create a margin of at least *Q*/4 so that original *xl* value when later contaminated will not exceed the quantization decision boundaries. The illustration of the concept of error correction is illustrated in Figure 5. The original feature *P* is quantized in *nQ* before adding noise, but after adding noise there is also a possibility that the noisy feature value could drop at the range [(*n* − 1)*Q*,(*n* − 0.5)*Q*)[ and will quantized as (*n* − 1)*Q*. As a solution to this, Authors propose to add or subtract 0.25*Q* to remain the features at the range [(*n* − 0.5)*Q*,(*n* + 0.5)*Q*)] and then remain the quantized value the same as the original quantized value *nQ* even after adding noise.

Other similar work based on this approach has recently been proposed (Ahmed et al., 2010) where the authors calculate and record a vector of 4-bits called "Perturbation information". This additional transmitted information has the same dimension of the extracted features. It is used at the receiver's end to adjust the intermediate hash during the image verification stage before performing quantization. Therefore, the information carried in the "Perturbation information" helps to make a decision to positively authenticate an image or not. Their theoretical analysis is more general than in (Sun & Chang, 2005) from a practical point of view. One main disadvantage of such schemes is that vectors used to correct errors of extracted 10 Will-be-set-by-IN-TECH

Fig. 4. The influence of additive Gaussian noise on the quantization (*Q* = 2) of the original DWT LL-subband coefficients and their noisy version in the interval [40, 50]. In green: DWT LL-subband quantized coefficients that dropped from the right neighboring quantization interval. In red: DWT LL-subband quantized coefficients that dropped from the left

LL-subband (level 3) coefficients, of Lena image sized 1024 × 1024, in the interval [40, 50] and their noisy version, in the same interval [40, 50], by an additive Gaussian noise of standard deviation equals • = 1. When applying a Gaussian noise with • = 1, the noisy image remains visually the same than the original image however it causes changes on extracted features distribution as we can see in Figure 4. This causes errors in the quantization step because the quantized features do not remain unchanged after noise addition as shown in Figure 4. To avoid such cases, many quantization schemes have been proposed in the literature. Authors in (Sun & Chang, 2005) proposes an error correction coding (ECC) to correct errors of extracted features caused by corruption from additive noise to get the same quantization result before and after additive noise. In their work, they assume that the quantization step *Q* > 4*B*, which is not always true at the practical point of view, and they push the points away from the quantization decision boundaries and create a margin of at least *Q*/4 so that original *xl* value when later contaminated will not exceed the quantization decision boundaries. The illustration of the concept of error correction is illustrated in Figure 5. The original feature *P* is quantized in *nQ* before adding noise, but after adding noise there is also a possibility that the noisy feature value could drop at the range [(*n* − 1)*Q*,(*n* − 0.5)*Q*)[ and will quantized as (*n* − 1)*Q*. As a solution to this, Authors propose to add or subtract 0.25*Q* to remain the features at the range [(*n* − 0.5)*Q*,(*n* + 0.5)*Q*)] and then remain the quantized value the same

Other similar work based on this approach has recently been proposed (Ahmed et al., 2010) where the authors calculate and record a vector of 4-bits called "Perturbation information". This additional transmitted information has the same dimension of the extracted features. It is used at the receiver's end to adjust the intermediate hash during the image verification stage before performing quantization. Therefore, the information carried in the "Perturbation information" helps to make a decision to positively authenticate an image or not. Their theoretical analysis is more general than in (Sun & Chang, 2005) from a practical point of view. One main disadvantage of such schemes is that vectors used to correct errors of extracted

neighboring quantization interval.

as the original quantized value *nQ* even after adding noise.

Fig. 5. Illustration on the concept of error correction in Sun's scheme (Sun & Chang, 2005).

features need to be transmitted or stored beside the image and the final hash as shown in Figures 6 and 7.

Fig. 6. Hash generation module with quantization in Fawad's scheme (Ahmed et al., 2010).

Fig. 7. Image verification module with quantization in Fawad's scheme (Ahmed et al., 2010).

Another quantization scheme which is widely applied in perceptual image hashing (Swaminathan et al., 2006), (Zhu et al., 2010) proposed by (Mihçak & Venkatesan, 2001) called *Adaptive Quantization* or *Probabilistic Quantization* in (Monga, 2005). Its property is that it takes into account to the distribution of the input data. The quantization intervals *<sup>Q</sup>* = •*<sup>j</sup>* − •*j*−<sup>1</sup> for *<sup>j</sup>* <sup>∈</sup> {1, ..., *<sup>P</sup>*} are designed so that •*<sup>j</sup>* •*j*−<sup>1</sup> *pX*(*x*) *dx* <sup>=</sup> 1/*P*, where *<sup>P</sup>* is the number of quantization levels and *pX*(**.**) is the pdf of the input data *X*. The central points {*Cj*} are defined so as to make *Cj* •*j*−<sup>1</sup> *pX*(*x*) *dx* <sup>=</sup> •*<sup>j</sup> Cj pX*(*x*) *dx* = 1/(2*P*). Around each •*j*, a randomization interval [*Aj*, *Bj*] is introduced such that •*<sup>j</sup> Aj pX*(*x*) *dx* <sup>=</sup> *Bj* •*<sup>j</sup> pX*(*x*) *dx* = *<sup>r</sup>*/*P*, where *<sup>r</sup>* ≤ 1/2. The randomization interval is symmetric around •*<sup>j</sup>* for all *j* in terms of distribution *pX*. The natural constraint must be respected *Cj* ≤ *Aj* and *Bj* ≤ *Cj*+1. The overall quantization rule is then

The convolution product *h*(*x*) of *P*•(*x*) by *PB*(*x*) is:

*h*(*x*) =

*<sup>h</sup>*(*x*) = <sup>+</sup><sup>∞</sup>

1 • *B x* + *<sup>B</sup>* <sup>2</sup> − *a* 

1

1 • *B* 

An example of *h*(*x*) is presented in Figure 8, with *B* < •

Fig. 8. Convolution product of *P*•(*x*) by *PB*(*x*).

calculated and expressed as follows (Equation 10):

−∞

Finally, we get the convolution product *h*(*x*) (equation (9)) expressed as:

<sup>−</sup> *<sup>x</sup>* <sup>+</sup> *<sup>B</sup>*

*P*•(*y*)*PB*(*x* − *y*) *dy* =

• for *x* ∈

<sup>2</sup> + *b* 

<sup>0</sup> for *<sup>x</sup>* <sup>≤</sup> *<sup>a</sup>* <sup>−</sup> *<sup>B</sup>*

Perceptual Image Hashing 29

0 for *x* > *b* + *<sup>B</sup>*

Suppose that *y* presents an extracted feature which is in the interval [*a*, *b*] and let *Pdrop* be the probability that *y* drops out from [*a*, *b*] because of the adding noise *B*. Thus, *Pdrop*(*y*) is

*h*(*x*) *dx* +

Equation (10) allows us to get an information of the extracted features behavior after adding noise. For example, for a uniform noise of length *B* = 4.10−2, if we want to have *Pdrop* = 10−3,

To make a comparison between the theoretical probability that extracted features drop out from the quantization interval given by Equation 10 and the experimental probability, we

 + ¯ *B* 2 *b*

*h*(*x*) *dx*

<sup>4</sup>• (10)

*Pdrop*(*y*) = *P*(*y* ∈/ [*a*, *b*])

= *a <sup>a</sup>*<sup>−</sup> *<sup>B</sup>* 2

<sup>=</sup> *<sup>B</sup>*

then the length of the quantization interval • that must be chosen is: • = 10.

 *b a* 1 •

for *x* ∈

for *x* ∈

2 .

 *<sup>a</sup>* <sup>−</sup> *<sup>B</sup>*

 *a* + *<sup>B</sup>*

 *<sup>b</sup>* <sup>−</sup> *<sup>B</sup>*

2 ,

2 .

<sup>2</sup> , *<sup>a</sup>* <sup>+</sup> *<sup>B</sup>* 2 ,

<sup>2</sup> , *<sup>b</sup>* <sup>−</sup> *<sup>B</sup>* 2 ,

<sup>2</sup> , *<sup>b</sup>* <sup>+</sup> *<sup>B</sup>* 2 ,

*PB*(*x* − *y*) *dy* (8)

(9)

given by:

$$q(\mathbf{x}\_{l}) = \begin{cases} \displaystyle j - 1 & \text{w.p.} \quad 1 & \text{if } \mathbf{C}\_{j} \le \mathbf{x}\_{l} < A\_{j}, \\ \\ \displaystyle j - 1 & \text{w.p.} \left( \frac{p}{2\tau} \int\_{\mathbf{x}\_{l}^{0}}^{B\_{j}} p\_{\mathbf{X}}(t) \, dt \right) & \text{if } A\_{j} \le \mathbf{x}\_{l} < B\_{j}, \\ \displaystyle j & \text{w.p.} \left( \frac{p}{2\tau} \int\_{A\_{j}}^{\mathbf{x}\_{l}^{0}} p\_{\mathbf{X}}(t) \, dt \right) & \text{if } A\_{j} \le \mathbf{x}\_{l} < B\_{j}, \\ \displaystyle j & \text{w.p.} \quad 1 & \text{if } B\_{j} \le \mathbf{x}\_{l} < \mathbf{C}\_{j+1}. \end{cases} \tag{5}$$

where w.p. stands for "with probability".

The discrete scheme of *Adaptive Quantization* has recently been developed by (Zhu et al., 2010) to make it applicable in practice.

#### 3.2 Theoretical analysis

In this section, we analyze statically the behavior of the extracted features under additive uniform noise, Section 3.2.1 and Gaussian noise, Section 3.2.2, as well as the probability of a false quantization for these selected features. The main goal of this analysis is to give a theoretical behavior of the extracted image features to be hashed against content-preserving /content-changing manipulations, that are simulated by an additive noise, that may undergo an image (Hadmi et al., 2011).

#### 3.2.1 Case of an additive uniform noise

To analyze the influence of an additive noise on perceptual image hashing robustness, we have decided to lead a statical analysis of the quantization problem. The idea is to compute the length of the quantization interval *Q* for a noise whose maximum absolute magnitude is *B*, which represents the content-preserving manipulations, and a previously fixed probability that a value in this interval drops out, that is denoted as *Pdrop*.

To address this problem, we have started by developing the convolution product between two distributions defined as follows:

• Let *P*•(*x*) denote the extracted feature distribution limited to an interval [*a*, *b*] of length • = *b* − *a*. *P*•(*x*) is given by:

$$P\_\bullet(\mathbf{x}) = \begin{cases} \begin{array}{c} \frac{1}{\bullet} & \text{for } \mathbf{x} \in [a, b]\_\prime \\\\ 0 & \text{otherwise.} \end{array} \end{cases} \tag{6}$$

• Let *PB*(*x*) denote the probability density function of the continuous uniform noise, which presents content-preserving manipulations, in the interval *<sup>B</sup>* = [<sup>−</sup> *<sup>B</sup>* <sup>2</sup> , *<sup>B</sup>* <sup>2</sup> ], with *B* < •. *PB*(*x*) is expressed as:

$$P\_B(\mathbf{x}) = \begin{cases} \begin{array}{c} \frac{1}{B} & \text{for } \mathbf{x} \in [-\frac{B}{2}, \frac{B}{2}], \\\\ 0 & \text{otherwise.} \end{array} \end{cases} \tag{7}$$

12 Will-be-set-by-IN-TECH

*j* − 1 w.p. 1 if *Cj* ≤ *xl* < *Aj*,

*j* w.p. 1 if *Bj* ≤ *xl* < *Cj*+1.

if *Aj* ≤ *xl* < *Bj*,

(5)

(6)

(7)

if *Aj* ≤ *xl* < *Bj*,

*xl pX*(*t*) *dt*

*Aj pX*(*t*) *dt*

The discrete scheme of *Adaptive Quantization* has recently been developed by (Zhu et al., 2010)

In this section, we analyze statically the behavior of the extracted features under additive uniform noise, Section 3.2.1 and Gaussian noise, Section 3.2.2, as well as the probability of a false quantization for these selected features. The main goal of this analysis is to give a theoretical behavior of the extracted image features to be hashed against content-preserving /content-changing manipulations, that are simulated by an additive noise, that may undergo

To analyze the influence of an additive noise on perceptual image hashing robustness, we have decided to lead a statical analysis of the quantization problem. The idea is to compute the length of the quantization interval *Q* for a noise whose maximum absolute magnitude is *B*, which represents the content-preserving manipulations, and a previously fixed probability

To address this problem, we have started by developing the convolution product between two

• Let *P*•(*x*) denote the extracted feature distribution limited to an interval [*a*, *b*] of length

1

• Let *PB*(*x*) denote the probability density function of the continuous uniform noise, which

1

• for *x* ∈ [*a*, *b*], 0 otherwise.

*<sup>B</sup>* for *<sup>x</sup>* <sup>∈</sup> [<sup>−</sup> *<sup>B</sup>*

0 otherwise.

<sup>2</sup> , *<sup>B</sup>* 2 ], <sup>2</sup> , *<sup>B</sup>*

<sup>2</sup> ], with *B* < •.

 

 

given by:

*q*(*xl*) =

to make it applicable in practice.

an image (Hadmi et al., 2011).

distributions defined as follows:

• = *b* − *a*. *P*•(*x*) is given by:

*PB*(*x*) is expressed as:

3.2.1 Case of an additive uniform noise

3.2 Theoretical analysis

*j* − 1 w.p.

*j* w.p.

that a value in this interval drops out, that is denoted as *Pdrop*.

*P*•(*x*) =

presents content-preserving manipulations, in the interval *<sup>B</sup>* = [<sup>−</sup> *<sup>B</sup>*

*PB*(*x*) =

 *P* 2*r Bj*

 *P* 2*r xl*

where w.p. stands for "with probability".

The convolution product *h*(*x*) of *P*•(*x*) by *PB*(*x*) is:

$$h(\mathbf{x}) = \int\_{-\infty}^{+\infty} P\_\bullet(y) P\_\mathcal{B}(\mathbf{x} - y) \, dy = \int\_{a}^{b} \frac{1}{\bullet} P\_\mathcal{B}(\mathbf{x} - y) \, dy \tag{8}$$

Finally, we get the convolution product *h*(*x*) (equation (9)) expressed as:

$$h(\mathbf{x}) = \begin{cases} 0 & \text{for } \mathbf{x} \le a - \frac{\mathcal{B}}{2}, \\ \frac{1}{\cdot \mathcal{B}} \left( \mathbf{x} + \frac{\mathcal{B}}{2} - a \right) & \text{for } \mathbf{x} \in \left] a - \frac{\mathcal{B}}{2}, a + \frac{\mathcal{B}}{2} \right], \\ \frac{1}{\cdot \mathcal{B}} & \text{for } \mathbf{x} \in \left] a + \frac{\mathcal{B}}{2}, b - \frac{\mathcal{B}}{2} \right], \\ \frac{1}{\cdot \mathcal{B}} \left( -\mathbf{x} + \frac{\mathcal{B}}{2} + b \right) & \text{for } \mathbf{x} \in \left] b - \frac{\mathcal{B}}{2}, b + \frac{\mathcal{B}}{2} \right], \\ 0 & \text{for } \mathbf{x} > b + \frac{\mathcal{B}}{2}. \end{cases} \tag{9}$$

An example of *h*(*x*) is presented in Figure 8, with *B* < • 2 .

Fig. 8. Convolution product of *P*•(*x*) by *PB*(*x*).

Suppose that *y* presents an extracted feature which is in the interval [*a*, *b*] and let *Pdrop* be the probability that *y* drops out from [*a*, *b*] because of the adding noise *B*. Thus, *Pdrop*(*y*) is calculated and expressed as follows (Equation 10):

$$\begin{aligned} P\_{drop}(y) &= P(y \notin [a, b]) \\ &= \int\_{a - \frac{B}{2}}^{a} h(\mathbf{x}) \, d\mathbf{x} + \int\_{b}^{\frac{-B}{2}} h(\mathbf{x}) \, d\mathbf{x} \\ &= \frac{B}{4^{\bullet}} \end{aligned} \tag{10}$$

Equation (10) allows us to get an information of the extracted features behavior after adding noise. For example, for a uniform noise of length *B* = 4.10−2, if we want to have *Pdrop* = 10−3, then the length of the quantization interval • that must be chosen is: • = 10.

To make a comparison between the theoretical probability that extracted features drop out from the quantization interval given by Equation 10 and the experimental probability, we

(a) Original image (b) • =1 (c) • =5

Perceptual Image Hashing 31

(d) • =10 (e) • =11 (f) • =14

(g) • =15 (h) • =20 (i) • =25

(j) • =30 (k) • =35 (l) • =40 Fig. 10. Original image and their noisy versions with different additive Gaussian noise

parametrized with different standard deviations • .

Fig. 9. Comparison between the theoretical and the experimental probabilities that extracted features drop out from the quantization interval for various noise lengths.

applied continuous uniform noise of different lengths from *B* = 0 to *B* = 50 on the same *N* = 10000 samples in the interval ∆ = [−10, 10], and then we calculated the probability *Pdrop* for each noise length. We note that the experimental results presented in Figure 9 coincide with the theoretical results calculated from Equation 10 for all noise lengths until *B* = 44. Some divergences are observed after this noise length which can be considered as content-changing (malicious) manipulations.

The same analysis can be performed for other noise distributions such as Gaussian distribution or triangular distribution. Thus, by just modeling the content-preserving manipulations by the aforementioned distributions, we can precisely obtain the probability from which the extracted features will drop from a fixed quantization interval to its neighboring intervals. Alternately, we can beforehand fix the maximum range of additive noise that we judge to be a content-preserving manipulation and the probability that extracted features change of quantization interval. This will allow us to fix the length of the appropriate quantization interval which respects to this probability.

#### 3.2.2 Case of an additive Gaussian noise

Figure 10 shows an example of an original image of size 512 × 512 and their noisy versions with many levels of additive Gaussian noise controlled by its standard deviation • . Note that the applied additive Gaussian noise is 0-mean, and changing its standard deviation • allows us to increase or decrease its level.

To evaluate the perceptual similarity between the original and their modified versions, we can based on the perceptual aspect provided by the Human Visual System (HVS), on the method of the Structural SIMilarity (SSIM)1 (Wang et al., 2004), or on the method of Peak Signal to Noise Ratio (PSNR). Table 2 gives the SSIM and PSNR values for noisy images obtained by applying different standard deviation values • of the additive Gaussian noise. The quality of the Gaussian noisy images is compared to the original image and they are classified into four

<sup>1</sup> SSIM is a classical measure well correlated to the Human Visual System.The SSIM values are real positive numbers lower or equal to 1. Stronger is the degradation and lower is the SSIM measure. A SSIM value of 1 means that the image is not degraded.

14 Will-be-set-by-IN-TECH

Theoretical probability Experimental probability

features drop out from the quantization interval for various noise lengths.

quantization interval which respects to this probability.

A SSIM value of 1 means that the image is not degraded.

3.2.2 Case of an additive Gaussian noise

us to increase or decrease its level.

0 5 10 15 20 25 30 35 40 45 50

Noise length

Fig. 9. Comparison between the theoretical and the experimental probabilities that extracted

applied continuous uniform noise of different lengths from *B* = 0 to *B* = 50 on the same *N* = 10000 samples in the interval ∆ = [−10, 10], and then we calculated the probability *Pdrop* for each noise length. We note that the experimental results presented in Figure 9 coincide with the theoretical results calculated from Equation 10 for all noise lengths until *B* = 44. Some divergences are observed after this noise length which can be considered as content-changing

The same analysis can be performed for other noise distributions such as Gaussian distribution or triangular distribution. Thus, by just modeling the content-preserving manipulations by the aforementioned distributions, we can precisely obtain the probability from which the extracted features will drop from a fixed quantization interval to its neighboring intervals. Alternately, we can beforehand fix the maximum range of additive noise that we judge to be a content-preserving manipulation and the probability that extracted features change of quantization interval. This will allow us to fix the length of the appropriate

Figure 10 shows an example of an original image of size 512 × 512 and their noisy versions with many levels of additive Gaussian noise controlled by its standard deviation • . Note that the applied additive Gaussian noise is 0-mean, and changing its standard deviation • allows

To evaluate the perceptual similarity between the original and their modified versions, we can based on the perceptual aspect provided by the Human Visual System (HVS), on the method of the Structural SIMilarity (SSIM)1 (Wang et al., 2004), or on the method of Peak Signal to Noise Ratio (PSNR). Table 2 gives the SSIM and PSNR values for noisy images obtained by applying different standard deviation values • of the additive Gaussian noise. The quality of the Gaussian noisy images is compared to the original image and they are classified into four

<sup>1</sup> SSIM is a classical measure well correlated to the Human Visual System.The SSIM values are real positive numbers lower or equal to 1. Stronger is the degradation and lower is the SSIM measure.

 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

(malicious) manipulations.

Probability

Fig. 10. Original image and their noisy versions with different additive Gaussian noise parametrized with different standard deviations • .

• Let *P*• (*x*) denote the probability density function of the Gaussian noise whose 0-mean and standard deviation • , which presents content-preserving manipulations. *P*• (*x*) is

Perceptual Image Hashing 33

<sup>−</sup> *<sup>x</sup>*<sup>2</sup>

*P*• (*y*) *dy*

1 <sup>√</sup>2• *<sup>e</sup>*

1 • <sup>√</sup>2• *<sup>e</sup>*

> − *y*2 <sup>2</sup> *dy*

<sup>−</sup> *<sup>y</sup>*<sup>2</sup> <sup>2</sup>• <sup>2</sup> *dy*

(13)

 *x*−*b* −∞

 *<sup>x</sup>*−*<sup>b</sup>* • −∞

<sup>2</sup>• <sup>2</sup> (12)

*<sup>P</sup>*• (*x*) = <sup>1</sup> • <sup>√</sup>2• *<sup>e</sup>*

*P*•(*y*)*P*• (*x* − *y*) *dy*

*P*• (*y*) *dy* −

<sup>−</sup> *<sup>y</sup>*<sup>2</sup> <sup>2</sup>• <sup>2</sup> *dy* −

<sup>−</sup> *er f <sup>x</sup>* <sup>−</sup> *<sup>b</sup>* <sup>√</sup>2•

The convolution product *h*(*x*) models the behavior of the original features after adding the Gaussian noise in each quantization interval. Figure 11 shows a normalized uniform distribution of 10000 features belonging in the interval [10, 20] before and after the quantization stage where the quantization step *Q*=10. All these features are quantized to the value 15 as shown in Figure 11. Figure 12 presents the normalized distribution of the noisy features after adding a Gaussian noise with 0-mean and standard deviation • =2. This distribution coincides exactly with the theoretical results given by Equation 13. As shown in Figure 12, the noisy features are quantized and spread in 3 quantization intervals and are quantized to three values: 5, 15 and 25. The 5 quantized value presents the quantized value to the left neighbor quantization interval and the 25 presents the quantized value to the right neighbor quantization interval. Statistically, for the same experiment settings we have 8% of features drop to the left neighbor quantization interval and 8% of features drop to the right neighbor quantization interval. For the other experiments settings, we always have a symmetric percentage of features drop in the left and right neighbor quantization interval.

In this section, we describe the quantization analysis protocol for perceptual image hashing based on statistical invariance of extracted block mean features. The aim is to find agreement between the density of the additive Gaussian noise, the size of the image block and the quantization step size that must be taken to ensure a good level of image hashing robustness. As shown in Figure 13, the original input image *I* of size *N* × *M* pixels is split to non

− *y*2 <sup>2</sup> *dy* −

1 • <sup>√</sup>2• *<sup>e</sup>*

1 <sup>√</sup>2• *<sup>e</sup>*

*er f <sup>x</sup>* <sup>−</sup> *<sup>a</sup>* <sup>√</sup>2•   *x*−*b* −∞

expressed as:

with *er f*(*x*) = √

2 • *<sup>x</sup>* <sup>0</sup> *<sup>e</sup>*−*<sup>t</sup>* 2 *dt*.

4. Experimental results

4.1 Experimental analysis protocol

The convolution product *h*(*x*) of *P*•(*x*) by *P*• (*x*) is:

*<sup>h</sup>*(*x*) = <sup>+</sup><sup>∞</sup>

<sup>=</sup> <sup>1</sup> •

<sup>=</sup> <sup>1</sup> •

<sup>=</sup> <sup>1</sup> •

<sup>=</sup> <sup>1</sup> 2• 

−∞

 *<sup>x</sup>*−*<sup>a</sup>* −∞

 *<sup>x</sup>*−*<sup>a</sup>* −∞

 *<sup>x</sup>*−*<sup>a</sup>* • −∞

categories: very similar, similar, different and very different. The changed images qualified very similar and similar (Figures 10(b), 10(c), 10(d), 10(e)) must have the same perceptual hash of the original image noted by *Iident*. Other cases of images *i.e.* images qualified as different or very different Figures (10(f), 10(g), 10(h), 10(i), 10(j), 10(k), 10(l)) from the original image must have a different perceptual hash noted by *Idiff* as presented in Table 2.


Table 2. SSIM and PSNR values for noisy images obtained by applying different standard deviation values • of the additive Gaussian noise.

In the case of • =1, the noisy image remains visually the same as the original image and it has high values of SSIM (*SSIM* = 0.997) and PSNR (*PSNR* = 47.79). For • =5, • =10 and • =11, the changes in the noisy images are very small and we can consider that the noisy images are still similar to the original image. In the case of • = 5, 10, 11, the SSIM values remain smaller than 80% and the PSNR values remain larger than 27*db*. When the level of the additive Gaussian noise increases, the noisy images are perceptually different from the original image as it is shown in Figure 10 for • =14,. . . ,40 and both the SSIM and PSNR values degrade. We can fix the threshold of the additive Gaussian noise that holds a good content in the sense of human perception fixed at • =11 as it is justified in term of the SSIM and PSNR values. We fixed the degradation to a SSIM value of 80% and the PSNR value at 27*db* to consider a noisy image similar to the original image. The threshold of the SSIM and PSNR values is justified in terms of the subjective measure based on the HVS for many tests that we have done for a large database of grayscale images as we can see in Figure 10.

To address theoretically the influence of an additive Gaussian noise whose 0-mean and standard deviation • on an uniform distribution of features limited in an interval [*a*, *b*], we compute the convolution product between the distribution of the extracted features and the distribution of the additive Gaussian noise defined as follows:

• Let *P*•(*x*) denote the extracted feature distribution limited to an interval [*a*, *b*] of length • = *b* - *a*. *P*•(*x*) is given by:

$$P\_\bullet(\mathbf{x}) = \begin{cases} \begin{array}{c} \frac{1}{\bullet} & \text{for } \mathbf{x} \in [a, b]\_\prime \\ 0 & \text{otherwise.} \end{array} \tag{11}$$

16 Will-be-set-by-IN-TECH

categories: very similar, similar, different and very different. The changed images qualified very similar and similar (Figures 10(b), 10(c), 10(d), 10(e)) must have the same perceptual hash of the original image noted by *Iident*. Other cases of images *i.e.* images qualified as different or very different Figures (10(f), 10(g), 10(h), 10(i), 10(j), 10(k), 10(l)) from the original image must

Standard deviation • SSIM PSNR (dB) Image quality Perceptual hash 0.997 47.79 very Similar *Iident* 0.946 34.15 Similar *Iident* 0.828 28.16 Similar *Iident* 0.802 27.32 Similar *Iident* 0.728 25.25 Different *Idiff* 0.704 24.70 Different *Idiff* 0.600 22.24 Different *Idiff* 0.517 20.36 Different *Idiff* 0.450 18.86 very Different *Idiff* 0.397 17.59 very Different *Idiff* 0.354 16.50 very Different *Idiff*

Table 2. SSIM and PSNR values for noisy images obtained by applying different standard

In the case of • =1, the noisy image remains visually the same as the original image and it has high values of SSIM (*SSIM* = 0.997) and PSNR (*PSNR* = 47.79). For • =5, • =10 and • =11, the changes in the noisy images are very small and we can consider that the noisy images are still similar to the original image. In the case of • = 5, 10, 11, the SSIM values remain smaller than 80% and the PSNR values remain larger than 27*db*. When the level of the additive Gaussian noise increases, the noisy images are perceptually different from the original image as it is shown in Figure 10 for • =14,. . . ,40 and both the SSIM and PSNR values degrade. We can fix the threshold of the additive Gaussian noise that holds a good content in the sense of human perception fixed at • =11 as it is justified in term of the SSIM and PSNR values. We fixed the degradation to a SSIM value of 80% and the PSNR value at 27*db* to consider a noisy image similar to the original image. The threshold of the SSIM and PSNR values is justified in terms of the subjective measure based on the HVS for many tests that we have done for a large

To address theoretically the influence of an additive Gaussian noise whose 0-mean and standard deviation • on an uniform distribution of features limited in an interval [*a*, *b*], we compute the convolution product between the distribution of the extracted features and the

• Let *P*•(*x*) denote the extracted feature distribution limited to an interval [*a*, *b*] of length

• for *x* ∈ [*a*, *b*],

0 otherwise. (11)

1

have a different perceptual hash noted by *Idiff* as presented in Table 2.

deviation values • of the additive Gaussian noise.

database of grayscale images as we can see in Figure 10.

• = *b* - *a*. *P*•(*x*) is given by:

distribution of the additive Gaussian noise defined as follows:

*P*•(*x*) =

• Let *P*• (*x*) denote the probability density function of the Gaussian noise whose 0-mean and standard deviation • , which presents content-preserving manipulations. *P*• (*x*) is expressed as:

$$P\_\bullet(\mathbf{x}) = \frac{1}{\bullet \sqrt{2\bullet}} e^{-\frac{\mathbf{x}^2}{2\bullet^2}} \tag{12}$$

The convolution product *h*(*x*) of *P*•(*x*) by *P*• (*x*) is:

$$\begin{split} h(\mathbf{x}) &= \int\_{-\infty}^{+\infty} P\_{\bullet}(y) P\_{\bullet} \left( \mathbf{x} - y \right) dy \\ &= \frac{1}{\bullet} \left( \int\_{-\infty}^{\mathbf{x} - a} P\_{\bullet} \left( y \right) dy - \int\_{-\infty}^{\mathbf{x} - b} P\_{\bullet} \left( y \right) dy \right) \\ &= \frac{1}{\bullet} \left( \int\_{-\infty}^{\mathbf{x} - a} \frac{1}{\bullet \sqrt{\Delta^{\bullet}}} e^{-\frac{y^{2}}{2\bullet}} dy - \int\_{-\infty}^{\mathbf{x} - b} \frac{1}{\bullet \sqrt{\Delta^{\bullet}}} e^{-\frac{y^{2}}{2\bullet}} dy \right) \\ &= \frac{1}{\bullet} \left( \int\_{-\infty}^{\frac{\mathbf{x} - a}{\bullet}} \frac{1}{\sqrt{\Delta^{\bullet}}} e^{-\frac{y^{2}}{2}} dy - \int\_{-\infty}^{\frac{\mathbf{x} - b}{\bullet}} \frac{1}{\sqrt{\Delta^{\bullet}}} e^{-\frac{y^{2}}{2}} dy \right) \\ &= \frac{1}{2\bullet} \left[ \text{erf} \left( \frac{\mathbf{x} - a}{\sqrt{\Delta^{\bullet}}} \right) - \text{erf} \left( \frac{\mathbf{x} - b}{\sqrt{\Delta^{\bullet}}} \right) \right] \end{split} \tag{13}$$

with *er f*(*x*) = √ 2 • *<sup>x</sup>* <sup>0</sup> *<sup>e</sup>*−*<sup>t</sup>* 2 *dt*.

The convolution product *h*(*x*) models the behavior of the original features after adding the Gaussian noise in each quantization interval. Figure 11 shows a normalized uniform distribution of 10000 features belonging in the interval [10, 20] before and after the quantization stage where the quantization step *Q*=10. All these features are quantized to the value 15 as shown in Figure 11. Figure 12 presents the normalized distribution of the noisy features after adding a Gaussian noise with 0-mean and standard deviation • =2. This distribution coincides exactly with the theoretical results given by Equation 13. As shown in Figure 12, the noisy features are quantized and spread in 3 quantization intervals and are quantized to three values: 5, 15 and 25. The 5 quantized value presents the quantized value to the left neighbor quantization interval and the 25 presents the quantized value to the right neighbor quantization interval. Statistically, for the same experiment settings we have 8% of features drop to the left neighbor quantization interval and 8% of features drop to the right neighbor quantization interval. For the other experiments settings, we always have a symmetric percentage of features drop in the left and right neighbor quantization interval.

#### 4. Experimental results

#### 4.1 Experimental analysis protocol

In this section, we describe the quantization analysis protocol for perceptual image hashing based on statistical invariance of extracted block mean features. The aim is to find agreement between the density of the additive Gaussian noise, the size of the image block and the quantization step size that must be taken to ensure a good level of image hashing robustness. As shown in Figure 13, the original input image *I* of size *N* × *M* pixels is split to non

The distribution *DistI* of the quantized vector V

of their integrity with the original image.

block mean.

we note by V¯

function of Vm(k) as follow:

*V*¯

*<sup>m</sup>*(*k*) = <sup>1</sup>

enabling us to make a comparison with distributions of other candidate images for verification

Perceptual Image Hashing 35

Fig. 13. Proposed quantization analysis protocol for perceptual image hashing based image

The image hashing system assumes that the original image *I* may be sent over a network consisting of possibly untrusted nodes. During the untrusted communication the original image could be manipulated for malicious purposes. Therefor, the received image ¯*I* may undergo non-malicious operations like JPEG compression, etc. or malicious tampering. The final perceptual hash of *I* should be used to authenticate its received version ¯*I*. In the case of non-malicious operations, the original feature vector and the received one should differ by a small Euclidean distance which makes quantization control easier, and by a large Euclidean distance in the case of content-changing manipulations. This allows to have different results after the quantization step. Note, that even if the feature vector undergo small changes under small additive noise may cause false authentication of the received image ¯*I* where it has to be considered similar to *I*. The received image ¯*I*, that we simulate like the original image plus a Gaussian noise with 0-mean and a standard deviation • , will undergo the same steps than

be the mean of an original image block of size *q* × *p* pixels noted by *pi*,*j*. By the same way,

*i*,*j* . V¯

the original image (Fig.13) which allows to get the distribution *Dist*¯*<sup>I</sup>* of <sup>V</sup>¯

<sup>m</sup>(k) the mean of noisy image block noted by *p*

*p* ∑ *i*=1

*p* ∑ *i*=1

*p* ∑ *i*=1

<sup>=</sup> *Vm*(*k*) + <sup>1</sup>

*q* ∑ *j*=1 *p i*,*j*

*q* ∑ *j*=1

*q* ∑ *j*=1

*p* × *q*

(*pi*,*<sup>j</sup>* + *ni*,*j*)

1 *p* × *q*

*q* ∑ *j*=1

*p* ∑ *i*=1

*q* ∑ *j*=1 *ni*,*<sup>j</sup>*

> *q* × *M q* }.

*pi*,*<sup>j</sup>* +

*p* ∑ *i*=1

*p* × *q*

<sup>=</sup> <sup>1</sup> *p* × *q*

<sup>=</sup> <sup>1</sup> *p* × *q*

where *ni*,*<sup>j</sup>* is a Gaussian noise belongs to <sup>N</sup>0,• and *<sup>k</sup>* <sup>∈</sup> {1, 2, . . . , *<sup>N</sup>*

<sup>m</sup> is then calculated and stored as a reference

<sup>m</sup>(k). Let Vm(k)

<sup>m</sup>(k) can be expressed as

*ni*,*<sup>j</sup>* (15)

Fig. 11. 10000 original features uniformly distributed in one quantization interval [10, 20] before quantization (black) and after uniform quantization (green) where the quantization step *Q*=10.

Fig. 12. 10000 noisy features after adding Gaussian noise whose 0-mean and standard deviation • = 2 before quantization (black) and after uniform quantization (green) where the quantization step *Q*=10.

overlapping blocks of size *<sup>q</sup>* <sup>×</sup> *<sup>p</sup>* pixels that we note by *Bi*,*j*, where *<sup>i</sup>* <sup>∈</sup> {1, 2, . . . , *<sup>N</sup> <sup>q</sup>* } and *<sup>j</sup>* <sup>∈</sup> {1, 2, . . . , *<sup>M</sup> <sup>p</sup>* }. The float mean value *mi*,*<sup>j</sup>* of each block *Bi*,*<sup>j</sup>* is computed and stored in a one

dimensional vector that we note by <sup>V</sup>m(k), where *<sup>k</sup>* <sup>∈</sup> {1, 2, . . . , *<sup>N</sup> q* × *M <sup>q</sup>* }. Quantization step is the conventional way to descretize the continuous vector *Vm*. For a given quantization size step *Q*, the quantized vector V <sup>m</sup>(k) of Vm(k) is given by the floor operation:

$$|V\_m'(k) = \lfloor \frac{V\_m(k)}{Q} \rfloor \times Q + \frac{Q}{2} \tag{14}$$

where *<sup>k</sup>*={1, 2, . . . , *<sup>N</sup> q* × *M q* }. 18 Will-be-set-by-IN-TECH

Original Features Quantized Features

5 10 15 20 25 30 0

Features

Original Features

Quantized Features

Theoritical Approximation (Equa.13)

*<sup>q</sup>* } and

*<sup>q</sup>* }. Quantization step

<sup>2</sup> (14)

0 5 10 15 20 25 30 0

Fig. 12. 10000 noisy features after adding Gaussian noise whose 0-mean and standard deviation • = 2 before quantization (black) and after uniform quantization (green) where the

overlapping blocks of size *<sup>q</sup>* <sup>×</sup> *<sup>p</sup>* pixels that we note by *Bi*,*j*, where *<sup>i</sup>* <sup>∈</sup> {1, 2, . . . , *<sup>N</sup>*

is the conventional way to descretize the continuous vector *Vm*. For a given quantization size

*Vm*(*k*)

dimensional vector that we note by <sup>V</sup>m(k), where *<sup>k</sup>* <sup>∈</sup> {1, 2, . . . , *<sup>N</sup>*

*V <sup>m</sup>*(*k*) = Features

*<sup>p</sup>* }. The float mean value *mi*,*<sup>j</sup>* of each block *Bi*,*<sup>j</sup>* is computed and stored in a one

<sup>m</sup>(k) of Vm(k) is given by the floor operation:

*<sup>Q</sup>* × *<sup>Q</sup>* <sup>+</sup> *<sup>Q</sup>*

*q* × *M*

Fig. 11. 10000 original features uniformly distributed in one quantization interval [10, 20] before quantization (black) and after uniform quantization (green) where the quantization

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1

quantization step *Q*=10.

step *Q*, the quantized vector V

*q* × *M q* }.

where *<sup>k</sup>*={1, 2, . . . , *<sup>N</sup>*

*<sup>j</sup>* <sup>∈</sup> {1, 2, . . . , *<sup>M</sup>*

Frequency

step *Q*=10.

Frequency

The distribution *DistI* of the quantized vector V <sup>m</sup> is then calculated and stored as a reference enabling us to make a comparison with distributions of other candidate images for verification of their integrity with the original image.

Fig. 13. Proposed quantization analysis protocol for perceptual image hashing based image block mean.

The image hashing system assumes that the original image *I* may be sent over a network consisting of possibly untrusted nodes. During the untrusted communication the original image could be manipulated for malicious purposes. Therefor, the received image ¯*I* may undergo non-malicious operations like JPEG compression, etc. or malicious tampering. The final perceptual hash of *I* should be used to authenticate its received version ¯*I*. In the case of non-malicious operations, the original feature vector and the received one should differ by a small Euclidean distance which makes quantization control easier, and by a large Euclidean distance in the case of content-changing manipulations. This allows to have different results after the quantization step. Note, that even if the feature vector undergo small changes under small additive noise may cause false authentication of the received image ¯*I* where it has to be considered similar to *I*. The received image ¯*I*, that we simulate like the original image plus a Gaussian noise with 0-mean and a standard deviation • , will undergo the same steps than the original image (Fig.13) which allows to get the distribution *Dist*¯*<sup>I</sup>* of <sup>V</sup>¯ <sup>m</sup>(k). Let Vm(k) be the mean of an original image block of size *q* × *p* pixels noted by *pi*,*j*. By the same way, we note by V¯ <sup>m</sup>(k) the mean of noisy image block noted by *p i*,*j* . V¯ <sup>m</sup>(k) can be expressed as function of Vm(k) as follow:

$$\begin{split} \bar{V}'\_{m}(k) &= \frac{1}{p \times q} \sum\_{i=1}^{p} \sum\_{j=1}^{q} p'\_{i,j} \\ &= \frac{1}{p \times q} \sum\_{i=1}^{p} \sum\_{j=1}^{q} (p\_{i,j} + n\_{i,j}) \\ &= \frac{1}{p \times q} \sum\_{i=1}^{p} \sum\_{j=1}^{q} p\_{i,j} + \frac{1}{p \times q} \sum\_{i=1}^{p} \sum\_{j=1}^{q} n\_{i,j} \\ &= V\_{m}(k) + \frac{1}{p \times q} \sum\_{i=1}^{p} \sum\_{j=1}^{q} n\_{i,j} \end{split} \tag{15}$$

where *ni*,*<sup>j</sup>* is a Gaussian noise belongs to <sup>N</sup>0,• and *<sup>k</sup>* <sup>∈</sup> {1, 2, . . . , *<sup>N</sup> q* × *M q* }.

• Block size=4 × 4 Block size=8 × 8 Block size=16 × 16

Perceptual Image Hashing 37

additive Gaussian noise in the case of quantization step size *Q* = 4.

Frequency

Frequency

Frequency

Frequency

Frequency

Moved from the right Moved from the left Not moved

Moved from the right Moved from the left Not moved

Moved from the right Moved from the left Not moved

Moved from the right Moved from the left Not moved

<sup>0</sup> <sup>50</sup> <sup>100</sup> <sup>150</sup> <sup>200</sup> <sup>250</sup> <sup>0</sup>

<sup>0</sup> <sup>50</sup> <sup>100</sup> <sup>150</sup> <sup>200</sup> <sup>250</sup> <sup>0</sup>

<sup>0</sup> <sup>50</sup> <sup>100</sup> <sup>150</sup> <sup>200</sup> <sup>250</sup> <sup>0</sup>

<sup>0</sup> <sup>50</sup> <sup>100</sup> <sup>150</sup> <sup>200</sup> <sup>250</sup> <sup>0</sup>

<sup>0</sup> <sup>50</sup> <sup>100</sup> <sup>150</sup> <sup>200</sup> <sup>250</sup> <sup>0</sup>

<sup>0</sup> <sup>50</sup> <sup>100</sup> <sup>150</sup> <sup>200</sup> <sup>250</sup> <sup>0</sup>

Quantized Mean

Quantized Mean

Quantized Mean

Quantized Mean

Quantized Mean

Moved from the right Moved from the left Not moved

Moved from the right Moved from the left Not moved

Moved from the right Moved from the left Not moved

Moved from the right Moved from the left Not moved

Frequency

Frequency

Frequency

Frequency

Moved from the right Moved from the left Not moved

Moved from the right Moved from the left Not moved

Moved from the right Moved from the left Not moved

Moved from the right Moved from the left Not moved

Frequency

Quantized Mean

<sup>0</sup> <sup>50</sup> <sup>100</sup> <sup>150</sup> <sup>200</sup> <sup>250</sup> <sup>0</sup>

<sup>0</sup> <sup>50</sup> <sup>100</sup> <sup>150</sup> <sup>200</sup> <sup>250</sup> <sup>0</sup>

<sup>0</sup> <sup>50</sup> <sup>100</sup> <sup>150</sup> <sup>200</sup> <sup>250</sup> <sup>0</sup>

<sup>0</sup> <sup>50</sup> <sup>100</sup> <sup>150</sup> <sup>200</sup> <sup>250</sup> <sup>0</sup>

Table 3. Variation of mean distribution for different image block sizes and different levels of

consider that an image which undergos a tolerable manipulations equivalent to an additive Gaussian noise whose a standard deviation equal to • = 5, we choose a compromise between

Quantized Mean

Quantized Mean

Quantized Mean

Quantized Mean

0

1

5

11

40

Frequency

Frequency

Frequency

Frequency

Frequency

<sup>0</sup> <sup>50</sup> <sup>100</sup> <sup>150</sup> <sup>200</sup> <sup>250</sup> <sup>0</sup>

Quantized Mean

<sup>0</sup> <sup>50</sup> <sup>100</sup> <sup>150</sup> <sup>200</sup> <sup>250</sup> <sup>0</sup>

<sup>0</sup> <sup>50</sup> <sup>100</sup> <sup>150</sup> <sup>200</sup> <sup>250</sup> <sup>0</sup>

<sup>0</sup> <sup>50</sup> <sup>100</sup> <sup>150</sup> <sup>200</sup> <sup>250</sup> <sup>0</sup>

<sup>0</sup> <sup>50</sup> <sup>100</sup> <sup>150</sup> <sup>200</sup> <sup>250</sup> <sup>0</sup>

Quantized Mean

Quantized Mean

Quantized Mean

Quantized Mean

The term " <sup>1</sup> *p*×*q p* ∑ *i*=1 *q* ∑ *j*=1 *ni*,*j*" in Equation 15 belongs to Gaussian distribution with 0-mean and standard deviation <sup>√</sup> • *<sup>p</sup>*×*<sup>q</sup>* .

V¯ <sup>m</sup> is the discrete vector which contains the quantized values of the computed means of the received image blocks. The comparison between *DistI* and *Dist*¯*<sup>I</sup>* allows us to get the information about the percentage of stable features that stayed fix after the additive Gaussian noise, the percentage of the features that moved to the left neighbor quantization interval and the percentage of the features that moved to the right neighbor quantization interval. This information of the features behavior is very useful, it allows us to take into account the percentage of the stable features that resist to non-malicious operations, simulated by an additive Gaussian noise. Also, it allows us to control the parameters of blocks size division and quantization step size to achieve an aimed level of the image hashing system robustness against a given level of additive noise. Selected features will then be hashed in the step of "Compression and Encryption" as shown in Figure 1. The "Compression and Encryption" stage is achieved by the cryptographic hash function SHA-1 generating a final hash of 160-bits with height level of security.

#### 4.2 Experimental analysis of the quantization problem in a perceptual image hashing system

In the experiments of the proposed scheme, the features are the means of different image block sizes. The computed image block are sized: 4 × 4, 8 × 8 and 16 × 16. Then after, they are quantized by different quantization step sizes: *Q*=1, *Q*=4 and *Q*=16. In other words, for each given quantization step size, we tested different image block sizes against different levels of the additive Gaussian noise. The experiments are tested for a large database of grayscale images of size 512 × 512. Figure 3 shows the variation of mean distribution for different image block sizes and different levels of additive Gaussian noise in the case of quantization step size *Q* = 4 applied for the image Figure 10(a). In the case of the quantization step size *Q*=4 and standard deviation • =1 (Figure 10(b)) (Table 3), we observe that unstable mean block features decrease when we increase the block size. We note also that the percent of stable mean block features is significant even in the case of block size equals to 4 × 4 (Table 4). When the standard deviation in the additive Gaussian noise increase (case of • =5 shown in Table 3) while keeping the visual contents of the noisy image the same as the original image 10(a), the percentage of the stable mean block features decrease compared to the case of • =1. When the visual contents of the noisy/attacked image changes Figure 10(l) than the original one (case of • =40), we observe that a little of mean block features remain stable for all the block size that we tested as shown in Table 3.

The obtained numerical results in Table 4 present the percentage of features that have not moved and remain stable under different additive Gaussian noise and also those that drop from the left neighbor quantization interval or from the right neighbor quantization interval for each size of image block. As we can observe in Table 4, the percentage of stable features that remain fixed after adding Gaussian noise decreases when the level of the noise increases. For the same level of noise, the percentage of stable features increase when the the image block size increase. Thus, if we set the quantization step size to *Q* = 1, we can take into account the percentage of stable features that resist against tolerable level of the additive Gaussian noise. For example, if we fix the quantization step size *Q* equals to value 1 and we 20 Will-be-set-by-IN-TECH

<sup>m</sup> is the discrete vector which contains the quantized values of the computed means of the received image blocks. The comparison between *DistI* and *Dist*¯*<sup>I</sup>* allows us to get the information about the percentage of stable features that stayed fix after the additive Gaussian noise, the percentage of the features that moved to the left neighbor quantization interval and the percentage of the features that moved to the right neighbor quantization interval. This information of the features behavior is very useful, it allows us to take into account the percentage of the stable features that resist to non-malicious operations, simulated by an additive Gaussian noise. Also, it allows us to control the parameters of blocks size division and quantization step size to achieve an aimed level of the image hashing system robustness against a given level of additive noise. Selected features will then be hashed in the step of "Compression and Encryption" as shown in Figure 1. The "Compression and Encryption" stage is achieved by the cryptographic hash function SHA-1 generating a final hash of 160-bits

4.2 Experimental analysis of the quantization problem in a perceptual image hashing

In the experiments of the proposed scheme, the features are the means of different image block sizes. The computed image block are sized: 4 × 4, 8 × 8 and 16 × 16. Then after, they are quantized by different quantization step sizes: *Q*=1, *Q*=4 and *Q*=16. In other words, for each given quantization step size, we tested different image block sizes against different levels of the additive Gaussian noise. The experiments are tested for a large database of grayscale images of size 512 × 512. Figure 3 shows the variation of mean distribution for different image block sizes and different levels of additive Gaussian noise in the case of quantization step size *Q* = 4 applied for the image Figure 10(a). In the case of the quantization step size *Q*=4 and standard deviation • =1 (Figure 10(b)) (Table 3), we observe that unstable mean block features decrease when we increase the block size. We note also that the percent of stable mean block features is significant even in the case of block size equals to 4 × 4 (Table 4). When the standard deviation in the additive Gaussian noise increase (case of • =5 shown in Table 3) while keeping the visual contents of the noisy image the same as the original image 10(a), the percentage of the stable mean block features decrease compared to the case of • =1. When the visual contents of the noisy/attacked image changes Figure 10(l) than the original one (case of • =40), we observe that a little of mean block features remain stable for all the block size that

The obtained numerical results in Table 4 present the percentage of features that have not moved and remain stable under different additive Gaussian noise and also those that drop from the left neighbor quantization interval or from the right neighbor quantization interval for each size of image block. As we can observe in Table 4, the percentage of stable features that remain fixed after adding Gaussian noise decreases when the level of the noise increases. For the same level of noise, the percentage of stable features increase when the the image block size increase. Thus, if we set the quantization step size to *Q* = 1, we can take into account the percentage of stable features that resist against tolerable level of the additive Gaussian noise. For example, if we fix the quantization step size *Q* equals to value 1 and we

*ni*,*j*" in Equation 15 belongs to Gaussian distribution with 0-mean and

The term " <sup>1</sup>

V¯

*p*×*q*

with height level of security.

we tested as shown in Table 3.

system

standard deviation <sup>√</sup> •

*p* ∑ *i*=1

*q* ∑ *j*=1

*<sup>p</sup>*×*<sup>q</sup>* .

Table 3. Variation of mean distribution for different image block sizes and different levels of additive Gaussian noise in the case of quantization step size *Q* = 4.

consider that an image which undergos a tolerable manipulations equivalent to an additive Gaussian noise whose a standard deviation equal to • = 5, we choose a compromise between

*<sup>Q</sup>* Block Size • **(%)** Not Moved **(%)** Moved from the

1

4

16

Right

1 79.4128 10.4004 10.1868 5 30.5237 34.8633 34.6130

 × 4 11 14.5569 42.5842 42.8589 11.4258 43.0176 45.5566 4.1382 47.5281 48.3337 90.7471 4.5410 4.7119 53.4180 23.3643 23.2178

Perceptual Image Hashing 39

 × 8 11 29.0283 35.0586 35.9131 23.0957 36.3037 40.6006 7.6660 46.5332 45.8008 94.9219 2.1484 2.9297 77.4414 11.8164 10.7422

16 × 16 11 52.9297 23.5352 23.5352

 × 4 11 50.4456 24.6826 24.8718 41.6382 28.4851 29.8767 15.9119 41.8457 42.2424 97.5098 1.2939 1.1963 87.6221 6.0547 6.3232

 × 8 11 73.7061 12.7930 13.5010 66.2598 15.4297 18.3105 29.2725 35.8154 34.9121 98.9258 0.5859 0.4883 94.7266 3.0273 2.2461

16 × 16 11 88.9648 4.9805 6.0547

 × 4 11 86.3953 6.6162 6.9885 82.8918 8.0811 9.0271 53.8086 22.5220 23.6694 99.4141 0.2686 0.3174 96.7529 1.5625 1.6846

 × 8 11 93.5059 3.0518 3.4424 91.7969 3.5645 4.6387 74.6826 12.5244 12.7930 99.9023 0.0000 0.0977 98.7305 0.7812 0.4883

16 × 16 11 96.5820 1.3672 2.0508

Table 4. Numerical results for different levels of the additive Gaussian noise and image block

size in the case of the quantization step sizes *Q* = 1, *Q* = 4 and *Q* = 16.

14 95.2148 1.9531 2.8320 40 85.3516 6.9336 7.7148

14 83.3984 7.7148 8.8867 40 45.7031 28.5156 25.7812

1 98.6694 0.6714 0.6592 5 93.9575 3.0273 3.0151

14 41.2109 27.2461 31.5430 40 14.2578 43.5547 42.1875

1 94.6960 2.7710 2.5330 5 74.7864 12.8540 12.3596

**(%)** Moved from the Left

the percentage of stable features and the size of the blocks image decomposition. For the block size equal 4 × 4 we have to take into account the maximum percent of stable features ≈ 30% and if the block size equals 8 × 8, we take into account the maximum percent of stable features ≈ 54%. The highest percentage of stable features ≈ 77% can be taken if we applied a 16 × 16 in the preprocessing image treatment. We tested our experiment on a large database of grayscale images of size 512 × 512 and we observed that these values presented in Table 4 can be obtained approximatively for others images of the same settings of image blocks decomposition and Gaussian noise addition, also we noted that the percentages of features that moved from the left and those moved from the right approximately equals which coincides with the theoretical study presented in Section 3.2.2. Same remarks of the approximately equalities of the percentages that moved from the left and the right are observed in the cases of *Q*=4 and *Q*=16 than in the case of the quantization step size *Q*=1. These obtained numerical values are almost approximately fixed in the same settings parameters in the block image decomposition and the level of Gaussian noise addition because we tested our experiments on large database grayscale images. These values are obtained for the grayscale image shown in Figure 10(a) and can be obtained for any other grayscale image.

Based on the numerical results presented in Table 4, Figure 14 shows the percentage of the features that remain stable under the additive Gaussian noise for different image blocks decomposition. As we remark, to get a high percentage of of stable features, we have two possibilities: either we apply great size of image block decomposition or the original image undergos small additive Gaussian noise.

(c) Case of quantization step size *Q*=16.

Fig. 14. Stability percent of mean features for a fixed quantization step size for different block sizes: (a) Case of quantization step sizes *Q* = 1, (b) Case of quantization step size *Q* = 4 and (c) Case of quantization step size *Q* = 16 .

Will-be-set-by-IN-TECH

the percentage of stable features and the size of the blocks image decomposition. For the block size equal 4 × 4 we have to take into account the maximum percent of stable features ≈ 30% and if the block size equals 8 × 8, we take into account the maximum percent of stable features ≈ 54%. The highest percentage of stable features ≈ 77% can be taken if we applied a 16 × 16 in the preprocessing image treatment. We tested our experiment on a large database of grayscale images of size 512 × 512 and we observed that these values presented in Table 4 can be obtained approximatively for others images of the same settings of image blocks decomposition and Gaussian noise addition, also we noted that the percentages of features that moved from the left and those moved from the right approximately equals which coincides with the theoretical study presented in Section 3.2.2. Same remarks of the approximately equalities of the percentages that moved from the left and the right are observed in the cases of *Q*=4 and *Q*=16 than in the case of the quantization step size *Q*=1. These obtained numerical values are almost approximately fixed in the same settings parameters in the block image decomposition and the level of Gaussian noise addition because we tested our experiments on large database grayscale images. These values are obtained for the grayscale image shown in Figure 10(a) and can be obtained for any other grayscale image. Based on the numerical results presented in Table 4, Figure 14 shows the percentage of the features that remain stable under the additive Gaussian noise for different image blocks decomposition. As we remark, to get a high percentage of of stable features, we have two possibilities: either we apply great size of image block decomposition or the original image

4x4 8x8 16x16

(b) Case of quantization step size *Q*=4.

σ=1 σ=5 σ=11 σ=14 σ=40 Block Size

σ=1 σ=5 σ=11 σ=14 σ=40

Stability percent

4x4 8x8 16x16

(c) Case of quantization step size *Q*=16.

Fig. 14. Stability percent of mean features for a fixed quantization step size for different block sizes: (a) Case of quantization step sizes *Q* = 1, (b) Case of quantization step size *Q* = 4 and

Block Size

undergos small additive Gaussian noise.

σ=1 σ=5 σ=11 σ=14 σ=40

Stability percent

4x4 8x8 16x16

(a) Case of quantization step size *Q*=1.

Block Size

Stability percent

(c) Case of quantization step size *Q* = 16 .


Table 4. Numerical results for different levels of the additive Gaussian noise and image block size in the case of the quantization step sizes *Q* = 1, *Q* = 4 and *Q* = 16.

*in multimedia information processing*, PCM'07, Springer-Verlag, Berlin, Heidelberg,

wavelet-based perceptual signatures, *IEEE International Conference on Image Processing*

quantization stage of robust perceptual image hashing, *IEEE* 3*rd European Workshop*

matrix invariants, *Proceedings of the IEEE International Conference on Image Processing*

compression from malicious manipulation, *IEEE Transactions on Circuits and Systems*

incidental distortion resistant scheme, *IEEE Transactions on Multimedia* 5(2): 161–173.

tool for robust audio identification and information hiding, *Proceedings of the 4th International Workshop on Information Hiding*, IHW '01, Springer-Verlag, London, UK,

Standard (SHS), Publication 180-3, *Technical report*, National Institute of Standards

Hadmi, A., Puech, W., AitEssaid, B. & Aitouahman, A. (2010). Analysis of the robustness of

Perceptual Image Hashing 41

Hadmi, A., Puech, W., AitEssaid, B. & Aitouahman, A. (2011). Statistical analysis of the

Han, S. H. & Chu, C. H. (2010). Content-based image authentication: current status, issues,

Kozat, S. S., Venkatesan, R. & Mihçak, M. K. (2004). Robust perceptual image hashing via

Lefebvre, F., Macq, B. & Legat, J. D. (2002). Rash: Radon soft hash algorithm, *Proceedings of the European Signal Processing Conference (EUSIPCO'02)*, Toulouse, France. Lei, Y., Wang, Y. & Huang, J. (2011). Robust image hash in radon transform domain for

Lin, C. Y. & Chang, S. F. (2001). A robust image authentication method distinguishing jpeg

Lu, C. S. & Liao, H. Y. M. (2003). Structural digital signature for image authentication: an

Memon, N. & Wong, P. W. (1998). Protecting digital media content, *Communication ACM*

Menezes, A. J., Vanstone, S. A. & Oorschot, P. C. V. (1996). *Handbook of Applied Cryptography*,

Mihçak, M. K. & R.Venkatesan (2001). New iterative geometric methods for robust perceptual

Mihçak, M. K. & Venkatesan, R. (2001). A perceptual audio hashing algorithm: A

Monga, V. (2005). *Perceptually Based Methods for Robust Image Hashing*, Phd dissertation,

Monga, V. & Evans, B. L. (2006). Perceptual image hashing via feature points: Performance evaluation and trade-offs, *IEEE Transactions on Image Processing* 15(11): 3452–3465. NIST (2008). FIPS PUB 180-3, Federal Information Processing Standard (FIPS), Secure Hash

Ozturk, I. & Ibrahim, S. (2005). Analysis and comparison of image encryption algorithms, *Education Technology and Training & Geoscience and Remote Sensing* 3: 803–806. Puech, W., Rodrigues, J. M. & Develay-Morice, J. E. (2007). A new fast reversible method for

image safe transfer, *Journal of Real-Time Image Processing* 2(1): 55–65.

image hashing, *Digital Rights Management Workshop*, pp. 13–21.

*Theory, Tools and Applications (IPTA'10)*, Paris, France, pp. 112–117.

and challenges, *International Journal of Information Security* 9: 19–32. Kerckhoffs, A. (1883). La cryptographie militaire, *Journal des sciences militaires* 9(1): 5–38. Khelifi, F. & Jiang, J. (2010). Perceptual image hashing based on virtual watermark detection,

authentication, *Signal Processing: Image Communication* 26: 280–288.

*on Visual Information Processing (EUVIP'11)*, Paris, France.

*IEEE Transactions on Image Processing* 19: 981–994.

*(ICIP'04)*, pp. 3443–3446.

41: 35–43.

UK, pp. 51–65.

University of Texas at Austin.

*for Video Technology* 11(2): 153–168.

1st edn, CRC Press, Inc., Boca Raton, FL, USA.

and Technology, Department of Commerce.

pp. 755–764.

#### 5. Conclusion

In this chapter, we introduced the main aim of the perceptual image hashing field in image security. We presented the important merits and requirements of a perceptual image hash function used for authentication wherein a formulation of the perceptual image hashing problem was given. We dedicated a section to presenting an overview of recent techniques that are used for perceptual image hashing. After, we presented the different quantization techniques used for more robustness of a perceptual image hashing scheme showing their advantages and their limitations. Finally, we presented a theoretical model describing the behavior of the extracted image features to be hashed against content-preserving/content-changing manipulations. In the presented analysis, we simulated the manipulations that may undergo the original image by an additive Gaussian noise. We tested the presented model by several experiments to demonstrate the effectiveness of the proposed theoretical model giving practical analysis for robust perceptual image hashing. The presented model is applied on image hashing based on statistical invariance of mean block features. The obtained results confirms the theoretical study presented in Section 3.2. Some approximations must be done to improve results. The same study can be generalized for other features in block-based image hashing scheme like DCT domain features, DWT domain features, etc.

#### 6. References


24 Will-be-set-by-IN-TECH

In this chapter, we introduced the main aim of the perceptual image hashing field in image security. We presented the important merits and requirements of a perceptual image hash function used for authentication wherein a formulation of the perceptual image hashing problem was given. We dedicated a section to presenting an overview of recent techniques that are used for perceptual image hashing. After, we presented the different quantization techniques used for more robustness of a perceptual image hashing scheme showing their advantages and their limitations. Finally, we presented a theoretical model describing the behavior of the extracted image features to be hashed against content-preserving/content-changing manipulations. In the presented analysis, we simulated the manipulations that may undergo the original image by an additive Gaussian noise. We tested the presented model by several experiments to demonstrate the effectiveness of the proposed theoretical model giving practical analysis for robust perceptual image hashing. The presented model is applied on image hashing based on statistical invariance of mean block features. The obtained results confirms the theoretical study presented in Section 3.2. Some approximations must be done to improve results. The same study can be generalized for other features in block-based image hashing scheme like DCT domain features, DWT domain

Ahmed, F. & Siyal, M. Y. (2006). A secure and robust wavelet-based hashing scheme for

Ahmed, F., Siyal, M. Y. & Abbas, V. U. (2010). A secure and robust hash-based scheme for

Bender, W., Gruhl, D., Morimoto, N. & Lu, A. (1996). Techniques for data hiding, *IBM Systems*

Bhattacharjee, S. K. & Kutter, M. (1998). Compression tolerant image authentication,

Cox, I. J., Miller, M. L. & Bloom, J. A. (2000). Watermarking applications and their properties,

*Computing (ITCC'00)*, IEEE Computer Society, Las Vegas, NV, USA, pp. 6–10. Cox, I. J., Miller, M. L. & Bloom, J. A. (2002). *Digital watermarking*, Morgan Kaufmann

Fridrich, J. (2000). Visual hash for oblivious watermarking, *SPIE Photonic West Electronic*

Fridrich, J. & Goljan, M. (2000). Robust hash functions for digital watermarking, *Proceedings of*

Furht, B., Socek, D. & Eskicioglu, A. M. (2004). Fundamentals of multimedia encryption techniques, *IN MULTIMEDIA SECURITY HANDBOOK*, CRC Press, pp. 93–131. Guo, X. C. & Hatzinakos, D. (2007). Content based image hashing via wavelet and

IEEE Computer Society, Washington, DC, USA, pp. 178–183.

*Science*, Springer Berlin / Heidelberg, pp. 51–62.

Publishers Inc., San Francisco, CA, USA.

Jose, California, pp. 286–294.

*Journal* 35(3-4): 313–336.

pp. 435–439.

image authentication, *Signal Processing* 90: 1456–1470.

image authentication, *in* T.-J. Cham, J. Cai, C. Dorai, D. Rajan, T.-S. Chua & L.-T. Chia (eds), *Advances in Multimedia Modeling*, Vol. 4352 of *Lecture Notes in Computer*

*Proceedings of the IEEE International Conference on Image Processing (ICIP (1))*,

*Proceedings of the The International Conference on Information Technology: Coding and*

*Imaging 2000, Security and Watermarking of Multimedia Contents*, Vol. 3971, SPIE, San

*the International Conference on Information Technology: Coding and Computing (ITCC'00)*,

radon transform, *Proceedings of the multimedia 8th Pacific Rim conference on Advances*

5. Conclusion

features, etc.

6. References

*in multimedia information processing*, PCM'07, Springer-Verlag, Berlin, Heidelberg, pp. 755–764.


**3** 

*Noida India* 

**Robust Multiple Image Watermarking** 

In this chapter, some multiple watermarking techniques and their limitations are discussed which include both spatial and transform domain methods. Since many algorithms are applied to graphical images, the concept of graphical image perceptibility and measures of

Watermarks are used to keep track of paper provenance and thus format and quality identification in the art of handmade papermaking nearly 700 years ago. In 1993 the term Watermark is used first time. In 1993-1994 the first papers on digital watermarking was published whereas in 1995 the first special session on image watermarking at NSIP95, Neos Marmaras, Greece was held. In 1995 one of the first images watermarking algorithms Patchwork algorithm was proposed. Watermarking has developed basically from two different streams, Cryptography meaning, secret writing and Steganography, which in the

This is the digital information revolution era. It has connectivity over the Internet and connectivity through the wireless network. Innovative devices such as digital camera and camcorder, high quality scanners and printers have reached consumers worldwide to create, manipulate and enjoy the multimedia data. The development of high speed computer networks and that of internet, in particular, has explored means of new business, scientific, entertainment and social opportunities in the form of electronic publishing and advertising, real-time information delivery, product ordering, transaction processing, digital repositories

Digital content are spreading rapidly in the world via the internet. It is possible to produce a number of the same one with the original data without any limitation. Copying is simple with no loss of fidelity. A copy of a digital media is identical to the original. This has many instances, led to the use of digital content with malicious intent. The current rapid development of new IT technologies for multimedia services has resulted in a strong demand for reliable and secure copyright protection techniques for multimedia data. One way to protect multimedia data against illegal recording and retransmission is to embed a signal, called digital signature or copyright label or watermark that authenticates the owner of the data. With the ease of editing and perfect reproduction in digital domain, the

**1. Introduction** 

PSNR and Bit Error Ratio (BER) are also discussed.

Greek language means, cover writing.

and libraries, personal communication etc.

**Based on Spread Transform** 

Jaishree Jain and Vijendra Rai *Mahamaya Technical University* 


## **Robust Multiple Image Watermarking Based on Spread Transform**

Jaishree Jain and Vijendra Rai *Mahamaya Technical University Noida India* 

#### **1. Introduction**

26 Will-be-set-by-IN-TECH

42 Watermarking – Volume 2

Rivest, R. L. (1992). The MD5 Message-Digest Algorithm, *Technical Report RFC 1321*, Internet

Rodrigues, J. M., Puech, W. & Bors, A. G. (2006). Selective encryption of human skin in jpeg

Schneider, M. & Chang, S. F. (1996). A robust content based digital signature for image

Seo, J. S., Haitsma, J., Kalker, T. & Yoo, C. D. (2004). A robust image fingerprinting system using the radon transform, *Signal Processing: Image Communication* 19(4): 325–339.

Sun, Q. & Chang, S. F. (2005). A robust and secure media signature scheme for jpeg images,

Swaminathan, A., Mao, Y. & Wu, M. (2006). Robust and secure image hashing, *IEEE*

Venkatesan, R., Koon, S. M., Jakubowski, M. H. & Moulin, P. (2000). Robust image

Wang, S. Z. & Zhang, X. P. (2007). Recent development of perceptual image hashing, *Journal*

Wang, Z., Bovik, A. C., Sheikh, H. R. & Simoncelli, E. P. (2004). Image quality assessment:

Zhu, G., Huang, J., Kwong, S. & Yang, J. (2010). Fragility analysis of adaptive

hashing, *Proceedings of the IEEE International Conference on Image Processing (ICIP'00)*,

From error visibility to structural similarity, *IEEE Tansactions on Image Processing*

quantization-based image hashing, *IEEE Transactions on Information Forensics and*

Stinson, D. (2002). *Cryptography: Theory and Practice*, 2nd edn, Chapman & Hall, CRC.

*Transactions on Information Forensics and Security* 1(2): 215–230.

images, *Proceedings of the IEEE International Conference on Image Processing (ICIP'06)*,

authentication, *Proceedings of the IEEE International Conference on Image Processing*

Engineering Task Force (IETF).

*(ICIP'96)*, Vol. 3, pp. 227–230.

*VLSI Signal Processing* 41(3): 305–317.

*of Shanghai University* 11: 323–331.

pp. 1981–1984.

pp. 664–666.

13(4): 600–612.

*Security* 5: 133–147.

In this chapter, some multiple watermarking techniques and their limitations are discussed which include both spatial and transform domain methods. Since many algorithms are applied to graphical images, the concept of graphical image perceptibility and measures of PSNR and Bit Error Ratio (BER) are also discussed.

Watermarks are used to keep track of paper provenance and thus format and quality identification in the art of handmade papermaking nearly 700 years ago. In 1993 the term Watermark is used first time. In 1993-1994 the first papers on digital watermarking was published whereas in 1995 the first special session on image watermarking at NSIP95, Neos Marmaras, Greece was held. In 1995 one of the first images watermarking algorithms Patchwork algorithm was proposed. Watermarking has developed basically from two different streams, Cryptography meaning, secret writing and Steganography, which in the Greek language means, cover writing.

This is the digital information revolution era. It has connectivity over the Internet and connectivity through the wireless network. Innovative devices such as digital camera and camcorder, high quality scanners and printers have reached consumers worldwide to create, manipulate and enjoy the multimedia data. The development of high speed computer networks and that of internet, in particular, has explored means of new business, scientific, entertainment and social opportunities in the form of electronic publishing and advertising, real-time information delivery, product ordering, transaction processing, digital repositories and libraries, personal communication etc.

Digital content are spreading rapidly in the world via the internet. It is possible to produce a number of the same one with the original data without any limitation. Copying is simple with no loss of fidelity. A copy of a digital media is identical to the original. This has many instances, led to the use of digital content with malicious intent. The current rapid development of new IT technologies for multimedia services has resulted in a strong demand for reliable and secure copyright protection techniques for multimedia data. One way to protect multimedia data against illegal recording and retransmission is to embed a signal, called digital signature or copyright label or watermark that authenticates the owner of the data. With the ease of editing and perfect reproduction in digital domain, the

Robust Multiple Image Watermarking Based on Spread Transform 45

The watermark can be a logo picture, sometimes a binary picture, sometimes a ternary picture; it can be a bit stream or also an encrypted bit stream etc. The encryption may be in the form of a hash function or encryption using a secret key. The watermark generation

Inputs to the embedding scheme are the watermark, the cover data and an optional public

Extraction is achieved in two steps. First the watermark or payload is extracted in the

**1. Decoding process**: Inputs to the decoding scheme are the watermarked data, the secret or public key and depending on the method, the original data and/or the original

decoding process and then the authenticity is established in the comparing process.

watermark. The output is the recovered watermarked W.

or secret key. The output is watermarked data. The key is used to enforce security.

Watermark insertion involves watermark generation and encoding process.

**1.1 Watermark insertion and extraction** 

**1.1.1 Watermark generation** 

process varies with the owner.

**1.1.2 Encoding process** 

Fig. 1. Embedding Process

**1.1.3 Watermark extraction** 

Fig. 2. Extraction Process

protection of ownership and the prevention of unauthorized tampering of multimedia data (audio, image, video, and document) have become important concerns. Digital watermarking schemes to embed secondary data in digital media, have made considerable progress in recent years and attracted attention from both academia and industry. Techniques have been proposed for a variety of applications, including ownership protection, authentication and access control. Imperceptibility, robustness against moderate processing such as compression, and the ability to hide many bits are the basic but rather conflicting requirements for many data hiding applications.

Digital watermarking is a technique to embed invisible or inaudible data within multimedia contents. Watermarked contents contain a particular data for copyrights. A hidden data is called a watermark, and the format can be an image or any type media. In case of ownership confliction in the process of distribution, digital watermark technique makes it possible to search and extract the ground for ownership. Many researches on watermarking have been come out in the advanced countries including USA and EU so far, because of its importance of this area in the future.

To avoid the unauthorized distribution of images or other multimedia property, various solutions have been proposed. Most of them make unobservable modifications to images that can be detected afterwards. Such image changes are called watermarks. Watermarking is defined as adding (embedding) a payload signal to the host signal. The payload can be detected or extracted later to make an assertion about the object i.e. the original data that may be an image or audio or video.

Multiple watermarking is an embranchment of digital watermarking which has many desirable characteristics that common singular watermarking does not have, such as robustness to union attacks. For example, employ multiple watermarks to convey multiple sets of information, intended to satisfy differing or similar goals, Used to increase robustness with many different methods, the embedded information is not easily lost, it is possible to support different access levels. To accomplish several goals, one might wish to embed several watermarks into the same image. For example, the owner might desire to use one watermark to convey ownership information, a second watermark to verify content integrity, a third watermark to convey a caption.

The aim of watermarking is to include subliminal information (i.e., imperceptible) in a multimedia document to ensure a security service or simply a labeling application. But existing multiple watermarking has inherent problem such as low validity and high complexity.

In general, any watermarking scheme (algorithm) consists of three parts:


Each owner has a unique watermark or an owner can also put different watermarks in different objects, the marking algorithm incorporates the watermark into the object. The verification algorithm authenticates the object determining both the owner and the integrity of the object.

#### **1.1 Watermark insertion and extraction**

Watermark insertion involves watermark generation and encoding process.

#### **1.1.1 Watermark generation**

44 Watermarking – Volume 2

protection of ownership and the prevention of unauthorized tampering of multimedia data (audio, image, video, and document) have become important concerns. Digital watermarking schemes to embed secondary data in digital media, have made considerable progress in recent years and attracted attention from both academia and industry. Techniques have been proposed for a variety of applications, including ownership protection, authentication and access control. Imperceptibility, robustness against moderate processing such as compression, and the ability to hide many bits are the basic but rather

Digital watermarking is a technique to embed invisible or inaudible data within multimedia contents. Watermarked contents contain a particular data for copyrights. A hidden data is called a watermark, and the format can be an image or any type media. In case of ownership confliction in the process of distribution, digital watermark technique makes it possible to search and extract the ground for ownership. Many researches on watermarking have been come out in the advanced countries including USA and EU so far, because of its importance

To avoid the unauthorized distribution of images or other multimedia property, various solutions have been proposed. Most of them make unobservable modifications to images that can be detected afterwards. Such image changes are called watermarks. Watermarking is defined as adding (embedding) a payload signal to the host signal. The payload can be detected or extracted later to make an assertion about the object i.e. the original data that

Multiple watermarking is an embranchment of digital watermarking which has many desirable characteristics that common singular watermarking does not have, such as robustness to union attacks. For example, employ multiple watermarks to convey multiple sets of information, intended to satisfy differing or similar goals, Used to increase robustness with many different methods, the embedded information is not easily lost, it is possible to support different access levels. To accomplish several goals, one might wish to embed several watermarks into the same image. For example, the owner might desire to use one watermark to convey ownership information, a second watermark to verify content

The aim of watermarking is to include subliminal information (i.e., imperceptible) in a multimedia document to ensure a security service or simply a labeling application. But existing multiple watermarking has inherent problem such as low validity and high

In general, any watermarking scheme (algorithm) consists of three parts:

& The decoder and comparator (verification or extraction or detection algorithm)

Each owner has a unique watermark or an owner can also put different watermarks in different objects, the marking algorithm incorporates the watermark into the object. The verification algorithm authenticates the object determining both the owner and the integrity

conflicting requirements for many data hiding applications.

of this area in the future.

complexity.

of the object.

& The watermark (payload)

may be an image or audio or video.

integrity, a third watermark to convey a caption.

& The encoder (marking insertion algorithm)

The watermark can be a logo picture, sometimes a binary picture, sometimes a ternary picture; it can be a bit stream or also an encrypted bit stream etc. The encryption may be in the form of a hash function or encryption using a secret key. The watermark generation process varies with the owner.

#### **1.1.2 Encoding process**

Inputs to the embedding scheme are the watermark, the cover data and an optional public or secret key. The output is watermarked data. The key is used to enforce security.

Fig. 1. Embedding Process

#### **1.1.3 Watermark extraction**

Extraction is achieved in two steps. First the watermark or payload is extracted in the decoding process and then the authenticity is established in the comparing process.

**1. Decoding process**: Inputs to the decoding scheme are the watermarked data, the secret or public key and depending on the method, the original data and/or the original watermark. The output is the recovered watermarked W.

Fig. 2. Extraction Process

Robust Multiple Image Watermarking Based on Spread Transform 47

discussed are very important for many real world applications. Each of the primary

Capacity is a fundamental property of any watermarking algorithm, which very often determines whether a technique can be profitably used in a given context or not. However no requirement can be set without considering the application the technique has to serve in. Possible requirements range from some hundreds of bits in security oriented applications, where robustness is a major concern, through several thousands of bits in applications like captioning or labeling, where the possibility of embedding a large number of bits is a primary need. For copy protection purposes, a payload of one bit is usually sufficient. Capacity requirements always struggle against two other important requirements, watermark imperceptibility and watermark robustness. A higher capacity is always obtained at the expense of either robustness or imperceptibility or both. It is therefore

The watermark should be imperceptible so as not to affect the viewing experience of the image or the quality of the image signal. In most applications the watermarking algorithm must embed the watermark such that this does not affect the quality of the underlying host data. A watermark embedding procedure is truly imperceptible if humans cannot distinguish the original data from the data with the inserted watermark. However even the smallest modification in the host data may become apparent when the original data is compared directly with the watermarked data. Since users of watermarked data normally do not have access to the original data, they cannot perform this comparison. Therefore, it may be sufficient that the modifications in the watermarked data go unnoticed as long as the

Watermark robustness accounts for the capability of the hidden data to survive host signal manipulations, including both non-malicious manipulations, which do not explicitly aim at removing the watermark or at making it unreadable, and malicious manipulations, which precisely aim at damaging the hidden information. The exact level of robustness the hidden data must possess cannot be specified without considering a particular application. Robustness against signal distortion is better achieved if the watermark is placed in perceptually significant parts of the signal. This is particularly evident in the case of lossy compression algorithms, which operate by discarding perceptually insignificant data. Watermarks hidden within perceptually insignificant data are likely not to survive compression. Achieving watermark robustness, and, to a major extent, watermark security

Any procedure that can decrease the performance of the watermarking scheme may be termed as an attack. Voloshynovskiy et.al [1] categorizes attacks into four classes' viz.

is one of the main challenges watermarking researches are facing with.

mandatory that a good trade-off be found depending on the application at hand.

attributes has been discussed in detail below.

**1.2.1 Capacity of watermarking techniques** 

data are not compared with the original data.

**1.2.2 Imperceptibility** 

**1.2.3 Robustness** 

**1.3 Watermarking attacks** 

**2. Comparison Process:** The extracted watermark is compared with the original watermark by a comparator function and a binary output decision is generated. The comparator is basically a correlated. Depending on the comparator output it can be determined if the data is authentic or not. If the comparator output is greater than equal to a threshold then the data is authentic else it is not authentic. Figure illustrates the comparing function. In this process the extracted watermark and the original watermark are passed through a comparator. The comparator output C is the compared with a threshold and a binary output decision generated. It is 1 if there is a match i.e. C and 0 otherwise. A watermark is detectable or extractable to be useful, depending on the way the watermark is inserted and depending on the nature of the watermarking algorithm, the method used can involve very distinct approaches. In some watermarking schemes, a watermark can be extracted in its exact form, a procedure we call watermark extraction. In other cases, we can detect only whether a specific given watermarking signal is present in an image, a procedure we call watermark detection. It should be noted that watermark extraction can prove ownership whereas watermark detection can only verify ownership [5].

Fig. 3. Comparison Process

#### **1.2 Practical challenges of watermarking**

Watermark by itself is not sufficient to prevent abuses unless a proper protection protocol is established. The exact properties that a watermarking algorithm must satisfy cannot be defined exactly without considering the particular application scenario; the algorithm has to be used in. A brief analysis of requirements of data hiding algorithms from a protocol perspective permits to decide whether a given algorithm is suitable for a certain application or not. Each watermarking application has its own specific requirements. Most often than not these requirements have conflicting effects on each other. A good watermarking algorithm obtains optimal tradeoff between these requirements; is not weakened/ destroyed by attacks, both malicious and non-malicious; at the same time unambiguously identifies the owner. These properties can be broadly classified as primary and secondary requirements. The primary requirements include data hiding capacity, imperceptibility and robustness as shown in figure4. However these three characteristics conflict with each other. Increasing fidelity of the watermarked images (i.e. increasing imperceptibility of the mark) would lower the strength of the watermark. Embedding large amount of information reduces the fidelity of the watermark. The secondary requirements include performance i.e. the speed of embedding and of detection of the watermark. These attributes though less commonly discussed are very important for many real world applications. Each of the primary attributes has been discussed in detail below.

#### **1.2.1 Capacity of watermarking techniques**

Capacity is a fundamental property of any watermarking algorithm, which very often determines whether a technique can be profitably used in a given context or not. However no requirement can be set without considering the application the technique has to serve in. Possible requirements range from some hundreds of bits in security oriented applications, where robustness is a major concern, through several thousands of bits in applications like captioning or labeling, where the possibility of embedding a large number of bits is a primary need. For copy protection purposes, a payload of one bit is usually sufficient. Capacity requirements always struggle against two other important requirements, watermark imperceptibility and watermark robustness. A higher capacity is always obtained at the expense of either robustness or imperceptibility or both. It is therefore mandatory that a good trade-off be found depending on the application at hand.

#### **1.2.2 Imperceptibility**

46 Watermarking – Volume 2

**2. Comparison Process:** The extracted watermark is compared with the original watermark by a comparator function and a binary output decision is generated. The comparator is basically a correlated. Depending on the comparator output it can be determined if the data is authentic or not. If the comparator output is greater than equal to a threshold then the data is authentic else it is not authentic. Figure illustrates the comparing function. In this process the extracted watermark and the original watermark are passed through a comparator. The comparator output C is the compared with a threshold and a binary output decision generated. It is 1 if there is a match i.e. C

 and 0 otherwise. A watermark is detectable or extractable to be useful, depending on the way the watermark is inserted and depending on the nature of the watermarking algorithm, the method used can involve very distinct approaches. In some watermarking schemes, a watermark can be extracted in its exact form, a procedure we call watermark extraction. In other cases, we can detect only whether a specific given watermarking signal is present in an image, a procedure we call watermark detection. It should be noted that watermark extraction can prove ownership whereas watermark

Watermark by itself is not sufficient to prevent abuses unless a proper protection protocol is established. The exact properties that a watermarking algorithm must satisfy cannot be defined exactly without considering the particular application scenario; the algorithm has to be used in. A brief analysis of requirements of data hiding algorithms from a protocol perspective permits to decide whether a given algorithm is suitable for a certain application or not. Each watermarking application has its own specific requirements. Most often than not these requirements have conflicting effects on each other. A good watermarking algorithm obtains optimal tradeoff between these requirements; is not weakened/ destroyed by attacks, both malicious and non-malicious; at the same time unambiguously identifies the owner. These properties can be broadly classified as primary and secondary requirements. The primary requirements include data hiding capacity, imperceptibility and robustness as shown in figure4. However these three characteristics conflict with each other. Increasing fidelity of the watermarked images (i.e. increasing imperceptibility of the mark) would lower the strength of the watermark. Embedding large amount of information reduces the fidelity of the watermark. The secondary requirements include performance i.e. the speed of embedding and of detection of the watermark. These attributes though less commonly

detection can only verify ownership [5].

Fig. 3. Comparison Process

**1.2 Practical challenges of watermarking** 

The watermark should be imperceptible so as not to affect the viewing experience of the image or the quality of the image signal. In most applications the watermarking algorithm must embed the watermark such that this does not affect the quality of the underlying host data. A watermark embedding procedure is truly imperceptible if humans cannot distinguish the original data from the data with the inserted watermark. However even the smallest modification in the host data may become apparent when the original data is compared directly with the watermarked data. Since users of watermarked data normally do not have access to the original data, they cannot perform this comparison. Therefore, it may be sufficient that the modifications in the watermarked data go unnoticed as long as the data are not compared with the original data.

#### **1.2.3 Robustness**

Watermark robustness accounts for the capability of the hidden data to survive host signal manipulations, including both non-malicious manipulations, which do not explicitly aim at removing the watermark or at making it unreadable, and malicious manipulations, which precisely aim at damaging the hidden information. The exact level of robustness the hidden data must possess cannot be specified without considering a particular application. Robustness against signal distortion is better achieved if the watermark is placed in perceptually significant parts of the signal. This is particularly evident in the case of lossy compression algorithms, which operate by discarding perceptually insignificant data. Watermarks hidden within perceptually insignificant data are likely not to survive compression. Achieving watermark robustness, and, to a major extent, watermark security is one of the main challenges watermarking researches are facing with.

#### **1.3 Watermarking attacks**

Any procedure that can decrease the performance of the watermarking scheme may be termed as an attack. Voloshynovskiy et.al [1] categorizes attacks into four classes' viz.

Robust Multiple Image Watermarking Based on Spread Transform 49

An attack is said to be non malicious if it results from the normal operations that watermarked data or any data for that matter has to undergo, like storage, transmission or fruition. The nature and strength of these attacks are strongly dependent on the application for which the watermarking system is devised. For example lossy- compression, geometric and temporal manipulations digital to analogue conversion, extraction of asset fragments (cropping), processing aimed at enhancing asset (e.g. noise reduction), etc. Examples of Non-Malicious Attacks: Lossy Compression: Many compression schemes like JPEG and

Geometric Distortions: Geometric distortions are specific to images and videos and include

Visible watermarks are the watermarks, existence of which is visible to the user. For example, to indicate ownership of originals, the content owner desires a visible mark that

Using patch work algorithm was proposed by N. Memon and P. Wong in 1998 [2].The author has selected n number of patches randomly and make certain statistics to make use of these patches as watermark. This method is more resistant to attempts of data removal by a third party but the scheme is extremely sensitive to geometric transformation. If the patch size is very small with sharp edges then it results in removal of watermark in lossy compressions, also optimal choice of patch shape is dependent upon the expected image modification. Due to the limitations of the spatial domain techniques the visible

A DCT domain visible watermarking technique for images [3] was developed by S. P. Mohanty, et al. The technique modifies DCT coefficients of the cover image and exploits the texture sensitivity of the human visual system. The perceptual quality of the image is better preserved in this technique as compared to the previous one but this technique is not robust

The invisible watermark's existence should be determined only through a watermark

Invisible image watermarks that will change, or disappear, if a watermarked image is altered are called as fragile watermarking. These watermarks are called fragile invisible watermarks because it is desired that they be altered or destroyed by most common image processing techniques. For example, invisible watermarking for a trustworthy camera.

extraction or detection algorithm. The invisible watermark falls into three categories:

MPEG can potentially degrade the data's quality through irretrievable loss of data.

**2. Different types of watermarks and watermarking techniques** 

such operations as rotation, translation, scaling and cropping.

watermarking is also developed in the transform domain.

for images having very few objects and large uniform areas.

**ii. Transform domain visible watermarking** 

**1.3.2 Non-malicious attack** 

**2.1 Visible watermark** 

**2.2 Invisible watermark** 

**1. Fragile watermarking** 

makes clear the source of the materials.

**i. Spatial domain visible watermarking** 

Fig. 4. Primary Requirements of Watermarking Algorithms

removal, geometric, cryptographic and protocol. Removal attack removes the watermark without having any prior knowledge about the watermark, while geometric attacks deal with de-synchronization of the receiver so that watermark detection is distorted. Cryptographic schemes are those that tend to crack the watermarking scheme and protocol attacks exploit invertible watermarks to cause ownership ambiguity. These attacks can be broadly classified as non-malicious (unintentional) such as compression of a legally obtained, watermarked image or video file and malicious (intentional) such as an attempt by a multimedia pirate to destroy the embedded information and prevent tracing of illegal copies of watermarked digital video. Watermarking systems utilized in copy protection or data authentication schemes are especially susceptible to malicious attacks. Non-malicious attacks usually come from common signal processing operations done by legitimate users of the watermarked materials.

#### **1.3.1 Malicious attack**

An attack is said to be malicious if its main goal is to remove or make the watermark unrecoverable. Malicious attacks can be further classified into two different classes.

**Blind:** A malicious attack is said to be blind if it tries to remove or make the watermark unrecoverable without exploiting knowledge of the particular algorithm that was used for watermarking the asset. For example, copy attack that estimates the watermark signal with aim of adding it to another asset.

**Informed:** A malicious attack is said to be informed if it attempts to remove or make the watermark unrecoverable by exploiting knowledge of the particular algorithm that was used for watermarking the asset. Such an attack first extracts some secret information about the algorithm from publicly available data and then based on this information nullifies the effectiveness of the watermarking system. Examples of malicious attacks: Printing and Rescanning.

#### **1.3.2 Non-malicious attack**

48 Watermarking – Volume 2

removal, geometric, cryptographic and protocol. Removal attack removes the watermark without having any prior knowledge about the watermark, while geometric attacks deal with de-synchronization of the receiver so that watermark detection is distorted. Cryptographic schemes are those that tend to crack the watermarking scheme and protocol attacks exploit invertible watermarks to cause ownership ambiguity. These attacks can be broadly classified as non-malicious (unintentional) such as compression of a legally obtained, watermarked image or video file and malicious (intentional) such as an attempt by a multimedia pirate to destroy the embedded information and prevent tracing of illegal copies of watermarked digital video. Watermarking systems utilized in copy protection or data authentication schemes are especially susceptible to malicious attacks. Non-malicious attacks usually come from common signal processing operations done by legitimate users of

An attack is said to be malicious if its main goal is to remove or make the watermark

**Blind:** A malicious attack is said to be blind if it tries to remove or make the watermark unrecoverable without exploiting knowledge of the particular algorithm that was used for watermarking the asset. For example, copy attack that estimates the watermark signal with

**Informed:** A malicious attack is said to be informed if it attempts to remove or make the watermark unrecoverable by exploiting knowledge of the particular algorithm that was used for watermarking the asset. Such an attack first extracts some secret information about the algorithm from publicly available data and then based on this information nullifies the effectiveness of the watermarking system. Examples of malicious attacks: Printing and

unrecoverable. Malicious attacks can be further classified into two different classes.

Fig. 4. Primary Requirements of Watermarking Algorithms

the watermarked materials.

aim of adding it to another asset.

**1.3.1 Malicious attack** 

Rescanning.

An attack is said to be non malicious if it results from the normal operations that watermarked data or any data for that matter has to undergo, like storage, transmission or fruition. The nature and strength of these attacks are strongly dependent on the application for which the watermarking system is devised. For example lossy- compression, geometric and temporal manipulations digital to analogue conversion, extraction of asset fragments (cropping), processing aimed at enhancing asset (e.g. noise reduction), etc. Examples of Non-Malicious Attacks: Lossy Compression: Many compression schemes like JPEG and MPEG can potentially degrade the data's quality through irretrievable loss of data.

Geometric Distortions: Geometric distortions are specific to images and videos and include such operations as rotation, translation, scaling and cropping.

#### **2. Different types of watermarks and watermarking techniques**

#### **2.1 Visible watermark**

Visible watermarks are the watermarks, existence of which is visible to the user. For example, to indicate ownership of originals, the content owner desires a visible mark that makes clear the source of the materials.

#### **i. Spatial domain visible watermarking**

Using patch work algorithm was proposed by N. Memon and P. Wong in 1998 [2].The author has selected n number of patches randomly and make certain statistics to make use of these patches as watermark. This method is more resistant to attempts of data removal by a third party but the scheme is extremely sensitive to geometric transformation. If the patch size is very small with sharp edges then it results in removal of watermark in lossy compressions, also optimal choice of patch shape is dependent upon the expected image modification. Due to the limitations of the spatial domain techniques the visible watermarking is also developed in the transform domain.

#### **ii. Transform domain visible watermarking**

A DCT domain visible watermarking technique for images [3] was developed by S. P. Mohanty, et al. The technique modifies DCT coefficients of the cover image and exploits the texture sensitivity of the human visual system. The perceptual quality of the image is better preserved in this technique as compared to the previous one but this technique is not robust for images having very few objects and large uniform areas.

#### **2.2 Invisible watermark**

The invisible watermark's existence should be determined only through a watermark extraction or detection algorithm. The invisible watermark falls into three categories:

#### **1. Fragile watermarking**

Invisible image watermarks that will change, or disappear, if a watermarked image is altered are called as fragile watermarking. These watermarks are called fragile invisible watermarks because it is desired that they be altered or destroyed by most common image processing techniques. For example, invisible watermarking for a trustworthy camera.

Robust Multiple Image Watermarking Based on Spread Transform 51

watermarking process does not introduce visual artifacts and retain the quality of the image and provide protection against retention of watermark after unauthorized alterations. As LUT is generated randomly, the pixel values may have to be adjusted by larger amounts to

(a)

(b)

Semi Fragile Watermarking Based on Wavelet Transform was proposed by Yuichi Nakai [6]. The technique is based on wavelet transform and embeds watermark to wavelet coefficients for evaluating the degree of tampering for each pixel. It embeds MSB of watermarks in low frequency components and LSB in high frequency component. The proposed scheme can evaluate the degree of tampering for each pixel but the number of watermarks that can be

Watermarks that persist even if someone tries to remove them are called as robust watermarking. Since they are desired to survive intentional attacks (e.g. active attack,

Fig. 7. The block Diagram of the Image Verification System with Proposed Invisible

get desired unary value.

Watermarking Technique [5]

**3. Robust watermarking** 

embedded without degradation of image quality is less.

Fig. 5. Watermark Insertion Process [4]

A Fragile Watermarking Scheme for Image Authentication with Tamper Localization Using Integer Wavelet Transform was proposed by M. Venkatesan, et al. in [4] in spatial domain. Watermark is randomly scattered in the LSB of the cover image. The technique is capable of detecting and localizing the malicious changes in the cover image and it has the ability to discriminate watermark and content tampering. The only limitation of the technique is that the relationship between the reliability of tamper detection and the localization accuracy has not investigated.

Fig. 6. Preprocessing [4]

#### **2. Semi-fragile watermarking**

These are the watermarking systems where content needs to be strictly protected, but the exact representation during exchange and storage need not be guaranteed. Semi fragile watermarking methods validate image content, but not its representation, and are thus made robust against allowable alterations, while being sensitive to non permitted modifications. For example, Semi fragile tamper detection methods are designed to monitor changes in the content and tamper detection is based on the visual assessment of perceived differences by an operator.

An invisible watermarking technique for image verification was proposed by Yeung, M.M and Mintzer F. [5] in spatial domain. The technique is developed using least significant Bit method and the verification key is generated using Look up Table (LUT). The method can localize the regions of image alterations and hence effectively use for tamper detection. The

A Fragile Watermarking Scheme for Image Authentication with Tamper Localization Using Integer Wavelet Transform was proposed by M. Venkatesan, et al. in [4] in spatial domain. Watermark is randomly scattered in the LSB of the cover image. The technique is capable of detecting and localizing the malicious changes in the cover image and it has the ability to discriminate watermark and content tampering. The only limitation of the technique is that the relationship between the reliability of tamper detection and the localization accuracy has

These are the watermarking systems where content needs to be strictly protected, but the exact representation during exchange and storage need not be guaranteed. Semi fragile watermarking methods validate image content, but not its representation, and are thus made robust against allowable alterations, while being sensitive to non permitted modifications. For example, Semi fragile tamper detection methods are designed to monitor changes in the content and tamper detection is based on the visual assessment of perceived

An invisible watermarking technique for image verification was proposed by Yeung, M.M and Mintzer F. [5] in spatial domain. The technique is developed using least significant Bit method and the verification key is generated using Look up Table (LUT). The method can localize the regions of image alterations and hence effectively use for tamper detection. The

Fig. 5. Watermark Insertion Process [4]

not investigated.

Fig. 6. Preprocessing [4]

differences by an operator.

**2. Semi-fragile watermarking** 

watermarking process does not introduce visual artifacts and retain the quality of the image and provide protection against retention of watermark after unauthorized alterations. As LUT is generated randomly, the pixel values may have to be adjusted by larger amounts to get desired unary value.

Fig. 7. The block Diagram of the Image Verification System with Proposed Invisible Watermarking Technique [5]

Semi Fragile Watermarking Based on Wavelet Transform was proposed by Yuichi Nakai [6]. The technique is based on wavelet transform and embeds watermark to wavelet coefficients for evaluating the degree of tampering for each pixel. It embeds MSB of watermarks in low frequency components and LSB in high frequency component. The proposed scheme can evaluate the degree of tampering for each pixel but the number of watermarks that can be embedded without degradation of image quality is less.

#### **3. Robust watermarking**

Watermarks that persist even if someone tries to remove them are called as robust watermarking. Since they are desired to survive intentional attacks (e.g. active attack,

Robust Multiple Image Watermarking Based on Spread Transform 53

watermarks should be quite fragile. In general, to apply multiple disparate watermarks, the most robust (ownership) watermark should be embedded first, the most fragile (verification) watermark should be embedded last, and moderately robust (captioning)

Embedding multiple watermarks will then be successful if the robust watermarks are sufficiently robust to withstand all subsequent watermark insertions. After the insertion of multiple watermarks, the watermarked image will possess texture resulting from each watermark. Embedding multiple watermarks also requires that each watermark add less

The multiple watermarking is broadly classified into three categories [10] as follows:

All watermarks are combined into a single watermark which is subsequently embedded in one single embedding step. The composite watermarks are separable if the watermarking patterns are orthogonal (or uncorrelated) in some sense relevant to the watermark detection.

The host data is partitioned into disjoint segments a priory and each watermark is embedded into its specific share. If all keys are present the detector can find a watermark in

This method is useful in the applications where retrieval of one watermark should depend on the retrieval of other watermark. For example, it allows us to determine the order in which the watermarks are embedded. The object becomes more degraded with every new watermark inserted into it, both in terms of PSNR and perceived quality. Example: Re-

In general, to apply multiple disparate watermarks, the most robust (ownership) watermark should be embedded first, the most fragile (verification) watermark should be embedded last, and moderately robust (captioning) watermarks should be inserted in between. Embedding multiple watermarks will be successful if the robust watermarks are sufficiently robust to withstand all subsequent watermark insertions. After the insertion of multiple watermarks, the watermarked image will possess texture resulting from each watermark. Embedding multiple watermarks also requires that each watermark add less texture than

The different watermarking techniques are broadly classified between two domains, namely

every segment, otherwise it cannot. Example: Interleaved watermarking.

It is the most straightforward method to embed the watermarks one after the other.

watermarks should be inserted in between.

texture than would be permissible.

**3.1 Types of multiple watermark** 

Example: Averaged watermarking

**ii. Segmented watermarking** 

**iii. Successive watermarking** 

watermarking.

would be permissible.

**3.2 Multiple watermarking techniques** 

spatial and transform domain.

**i. Composite watermarking** 

passive attack etc.), these are called as robust image watermarks. For example, Evidence of ownership.

Van Schyndel, et al. has developed robust watermarking in his paper ''A Digital Watermark'' [7] in spatial domain. The original 8 bit grey scale image data is compressed to 7 bits by adaptive histogram manipulation. The watermark is generated using an m sequence generator. The watermark was embedded to the LSB of the original image and

Cross-correlation based detection was proposed. The method utilizes linear addition of watermark data and is more difficult to decode, offering inherent security. The technique is compatible with JPEG processing. The watermark is not robust to additive noise.

Fig. 8. Embedding 8-ary Watermarks in Several Wavelet Coefficient Level [7]

I.A. Nasir has divided the host image into four different regions each consisting of 128 ×128 blocks in order to hide a watermark [8]. The watermark is a binary image encrypted and embedded into different regions of the blue component of the image by altering intensity values of the selected regions. The watermarks can be extracted by comparing the intensities of the selected region of the original image with the corresponding region of the watermarked image. The proposed watermarking scheme is robust for a wide range of attacks including JPEG compression, rotation, scaling, filtering, etc. The number of watermarks that can be embedded effectively is not statistically proved.

#### **3. Multiple watermarking basics**

Multiple watermarking is an embranchment of digital watermarking, which has many desirable characteristics that common singular watermarking does not have. For example, employ multiple watermarks to convey multiple sets of information, intended to satisfy differing or similar goals, used to increase robustness with many different methods, the embedded information is not easily lost, it is possible to support different access levels. To accomplish several goals, one might wish to embed several watermarks into the same image. For example, the owner might desire to use one watermark to convey ownership information, a second watermark to verify content integrity, a third watermark to convey a caption [9].In general, to apply multiple disparate watermarks, ownership watermarks should be very robust, captioning watermarks should be robust, and Verification watermarks should be quite fragile. In general, to apply multiple disparate watermarks, the most robust (ownership) watermark should be embedded first, the most fragile (verification) watermark should be embedded last, and moderately robust (captioning) watermarks should be inserted in between.

Embedding multiple watermarks will then be successful if the robust watermarks are sufficiently robust to withstand all subsequent watermark insertions. After the insertion of multiple watermarks, the watermarked image will possess texture resulting from each watermark. Embedding multiple watermarks also requires that each watermark add less texture than would be permissible.

#### **3.1 Types of multiple watermark**

52 Watermarking – Volume 2

passive attack etc.), these are called as robust image watermarks. For example, Evidence of

Van Schyndel, et al. has developed robust watermarking in his paper ''A Digital Watermark'' [7] in spatial domain. The original 8 bit grey scale image data is compressed to 7 bits by adaptive histogram manipulation. The watermark is generated using an m sequence generator. The watermark was embedded to the LSB of the original image and

Cross-correlation based detection was proposed. The method utilizes linear addition of watermark data and is more difficult to decode, offering inherent security. The technique is

compatible with JPEG processing. The watermark is not robust to additive noise.

Fig. 8. Embedding 8-ary Watermarks in Several Wavelet Coefficient Level [7]

watermarks that can be embedded effectively is not statistically proved.

**3. Multiple watermarking basics** 

I.A. Nasir has divided the host image into four different regions each consisting of 128 ×128 blocks in order to hide a watermark [8]. The watermark is a binary image encrypted and embedded into different regions of the blue component of the image by altering intensity values of the selected regions. The watermarks can be extracted by comparing the intensities of the selected region of the original image with the corresponding region of the watermarked image. The proposed watermarking scheme is robust for a wide range of attacks including JPEG compression, rotation, scaling, filtering, etc. The number of

Multiple watermarking is an embranchment of digital watermarking, which has many desirable characteristics that common singular watermarking does not have. For example, employ multiple watermarks to convey multiple sets of information, intended to satisfy differing or similar goals, used to increase robustness with many different methods, the embedded information is not easily lost, it is possible to support different access levels. To accomplish several goals, one might wish to embed several watermarks into the same image. For example, the owner might desire to use one watermark to convey ownership information, a second watermark to verify content integrity, a third watermark to convey a caption [9].In general, to apply multiple disparate watermarks, ownership watermarks should be very robust, captioning watermarks should be robust, and Verification

ownership.

The multiple watermarking is broadly classified into three categories [10] as follows:

#### **i. Composite watermarking**

All watermarks are combined into a single watermark which is subsequently embedded in one single embedding step. The composite watermarks are separable if the watermarking patterns are orthogonal (or uncorrelated) in some sense relevant to the watermark detection. Example: Averaged watermarking

#### **ii. Segmented watermarking**

The host data is partitioned into disjoint segments a priory and each watermark is embedded into its specific share. If all keys are present the detector can find a watermark in every segment, otherwise it cannot. Example: Interleaved watermarking.

#### **iii. Successive watermarking**

It is the most straightforward method to embed the watermarks one after the other.

This method is useful in the applications where retrieval of one watermark should depend on the retrieval of other watermark. For example, it allows us to determine the order in which the watermarks are embedded. The object becomes more degraded with every new watermark inserted into it, both in terms of PSNR and perceived quality. Example: Rewatermarking.

In general, to apply multiple disparate watermarks, the most robust (ownership) watermark should be embedded first, the most fragile (verification) watermark should be embedded last, and moderately robust (captioning) watermarks should be inserted in between. Embedding multiple watermarks will be successful if the robust watermarks are sufficiently robust to withstand all subsequent watermark insertions. After the insertion of multiple watermarks, the watermarked image will possess texture resulting from each watermark. Embedding multiple watermarks also requires that each watermark add less texture than would be permissible.

#### **3.2 Multiple watermarking techniques**

The different watermarking techniques are broadly classified between two domains, namely spatial and transform domain.

Robust Multiple Image Watermarking Based on Spread Transform 55

This technique is also called as multiresolution technique. The important aspect of this technique is that watermark is introduced in imperceptibly significant regions of the data in order to remain robust. It decomposes the image into frequency bands using resolution of wavelets. X.Xia in 1997 proposed the concept of Multiresolution Watermark for Digital Images using wavelet transformation [14]. An image can be decomposed into a pyramid structure with various bands information: such as low-low frequency bands, low-high, highlow or high-high frequency bands. Adding watermarks on the large coefficients (HH, LH, HL and LL) is difficult for the human eyes to perceive. If distortion of a watermarked image is not serious, only a few bands worth of information are needed to detect the signature and therefore computational load can be served. This method is robust to all kinds of distortions such as compression, additive noise, etc. If distortion of a watermarked image is more, more

bands of DWT are needed to detect watermark and computational load increases.

X. Liang and Wu Huizhong have proposed the multiple perceptual watermarks using multiple-based number conversion in wavelet domain [15]. Multiple watermarks coding and decoding system for image copyright protection is presented. Just Noticeable Difference (JND) threshold in wavelet domain is used to determine the locations for embedding. A multiple-based number system (every digit in number has base bi 0) is proposed to convert the watermark information into values to be embedded in the wavelet coefficient. The method has good robustness to JPEG compression, median filtering, Gaussian noise suppression, cropping and morphing type of distortions. Watermark strength is more as

The limitations of wavelet transform have been overcome in dual tree complex wavelet transform. Lan Hong xing et al. in the paper ''A Digital Watermarking Algorithm Based on Dual-tree Complex Wavelet Transform'' [16], has proposed a multipurpose watermarking algorithm based on dual tree complex-wavelet transform. The authors notify the copyright owner with visible watermark and to protect the copyright with an invisible watermark. Dual- tree DWT has relatively high capacity to make the visible watermark hard to remove and invisible watermark robust. The only difficulty is in redesign of watermark with perfect reconstruction properties. It can only bring less visual effects for reconstruction of image in

Fig. 9. Diagram of New Watermarking Technique [14]

JND is used. The method fails to stir mark attack.

+/-45 sub bands.

**ii. Discrete wavelet transform** 

#### **3.2.1 Spatial domain**

The spatial techniques insert the watermark in the underused least significant bits of the image. This allows a watermark to be inserted in an image without affecting the value of the image. Example: Least Significant Bit, Statistical, block based method. The most common implementation of spatial domain watermarking is Least Significant Bit (LSB) replacement method. It involves replacing the n least significant bits of each pixel of a container image with the data of a hidden image. Since the human visual system is not very attuned to small variations in color, the method adjusts the small differences between adjacent pixels leaving the result virtually unnoticeable.

#### **3.2.2 Transformed domain techniques**

In the transform domain approach, some sort of transforms is applied to the original image first. The transform applied may be (DCT), (DFT), (DWT), etc. The watermark is embedded by modifying the transform domain coefficients. Example: DFT, DCT, DWT, Spread Spectrum.

Traditional watermarking schemes consisted of visible watermarking. Applications now demand that the watermark being embedded be highly robust to attacks. Techniques of hiding information in images include the use of discrete cosine transform (DCT), discrete

Fourier transforms (DFT) and wavelet transform.

#### **i. Discrete cosine transform**

This is the most commonly used transform for watermarking purpose. The DCT allows an image to be broken up into different frequency bands making it much easier to embed watermarking information into the middle frequency bands of an image. In our technique we use middle-band DCT coefficients to encode the message. It avoids the most visual important parts of the image without over exposing themselves to removal through compression and noise-attacks.

I J. Cox have considered watermarking as communications with side information [11].

The DCT allows an image to be broken up into different frequency bands, making it much easier to embed watermarking information into the middle frequency bands of an image.

Algorithm achieves good robustness against compression and other signal processing attacks due to the selection of perceptually significant transform domain coefficients. Robustness and the quality of the watermark could be improved if the properties of the host image could similarly be exploited.

M. Barni has embedded pseudo-random sequence of real numbers having normal distribution with zero mean and unity variance in selected set of DCT coefficients [12]. The watermark is robust to several signal processing techniques, including JPEG compression, low pass and median filtering, dithering etc. But watermark does not resist geometric translations. Mitchell et al. has computed a frequency mask for each block [13].The resulting perceptual mask is scaled and multiplied by the DCT of a pseudo-noise sequence which is different for each block. This watermark is then added to the corresponding DCT block. The watermark is robust to several distortions including white and colored noise, cropping, etc. For JPEG coding at 10% the quality of original image degrades.

Fig. 9. Diagram of New Watermarking Technique [14]

#### **ii. Discrete wavelet transform**

54 Watermarking – Volume 2

The spatial techniques insert the watermark in the underused least significant bits of the image. This allows a watermark to be inserted in an image without affecting the value of the image. Example: Least Significant Bit, Statistical, block based method. The most common implementation of spatial domain watermarking is Least Significant Bit (LSB) replacement method. It involves replacing the n least significant bits of each pixel of a container image with the data of a hidden image. Since the human visual system is not very attuned to small variations in color, the method adjusts the small differences between adjacent pixels leaving

In the transform domain approach, some sort of transforms is applied to the original image first. The transform applied may be (DCT), (DFT), (DWT), etc. The watermark is embedded by modifying the transform domain coefficients. Example: DFT, DCT, DWT, Spread Spectrum. Traditional watermarking schemes consisted of visible watermarking. Applications now demand that the watermark being embedded be highly robust to attacks. Techniques of hiding information in images include the use of discrete cosine transform (DCT), discrete

This is the most commonly used transform for watermarking purpose. The DCT allows an image to be broken up into different frequency bands making it much easier to embed watermarking information into the middle frequency bands of an image. In our technique we use middle-band DCT coefficients to encode the message. It avoids the most visual important parts of the image without over exposing themselves to removal through

I J. Cox have considered watermarking as communications with side information [11].

The DCT allows an image to be broken up into different frequency bands, making it much easier to embed watermarking information into the middle frequency bands of an image.

Algorithm achieves good robustness against compression and other signal processing attacks due to the selection of perceptually significant transform domain coefficients. Robustness and the quality of the watermark could be improved if the properties of the host

M. Barni has embedded pseudo-random sequence of real numbers having normal distribution with zero mean and unity variance in selected set of DCT coefficients [12]. The watermark is robust to several signal processing techniques, including JPEG compression, low pass and median filtering, dithering etc. But watermark does not resist geometric translations. Mitchell et al. has computed a frequency mask for each block [13].The resulting perceptual mask is scaled and multiplied by the DCT of a pseudo-noise sequence which is different for each block. This watermark is then added to the corresponding DCT block. The watermark is robust to several distortions including white and colored noise, cropping, etc.

**3.2.1 Spatial domain** 

the result virtually unnoticeable.

**i. Discrete cosine transform** 

compression and noise-attacks.

image could similarly be exploited.

For JPEG coding at 10% the quality of original image degrades.

**3.2.2 Transformed domain techniques** 

Fourier transforms (DFT) and wavelet transform.

This technique is also called as multiresolution technique. The important aspect of this technique is that watermark is introduced in imperceptibly significant regions of the data in order to remain robust. It decomposes the image into frequency bands using resolution of wavelets. X.Xia in 1997 proposed the concept of Multiresolution Watermark for Digital Images using wavelet transformation [14]. An image can be decomposed into a pyramid structure with various bands information: such as low-low frequency bands, low-high, highlow or high-high frequency bands. Adding watermarks on the large coefficients (HH, LH, HL and LL) is difficult for the human eyes to perceive. If distortion of a watermarked image is not serious, only a few bands worth of information are needed to detect the signature and therefore computational load can be served. This method is robust to all kinds of distortions such as compression, additive noise, etc. If distortion of a watermarked image is more, more bands of DWT are needed to detect watermark and computational load increases.

X. Liang and Wu Huizhong have proposed the multiple perceptual watermarks using multiple-based number conversion in wavelet domain [15]. Multiple watermarks coding and decoding system for image copyright protection is presented. Just Noticeable Difference (JND) threshold in wavelet domain is used to determine the locations for embedding. A multiple-based number system (every digit in number has base bi 0) is proposed to convert the watermark information into values to be embedded in the wavelet coefficient. The method has good robustness to JPEG compression, median filtering, Gaussian noise suppression, cropping and morphing type of distortions. Watermark strength is more as JND is used. The method fails to stir mark attack.

The limitations of wavelet transform have been overcome in dual tree complex wavelet transform. Lan Hong xing et al. in the paper ''A Digital Watermarking Algorithm Based on Dual-tree Complex Wavelet Transform'' [16], has proposed a multipurpose watermarking algorithm based on dual tree complex-wavelet transform. The authors notify the copyright owner with visible watermark and to protect the copyright with an invisible watermark. Dual- tree DWT has relatively high capacity to make the visible watermark hard to remove and invisible watermark robust. The only difficulty is in redesign of watermark with perfect reconstruction properties. It can only bring less visual effects for reconstruction of image in +/-45 sub bands.

Robust Multiple Image Watermarking Based on Spread Transform 57

*i j i j ij i j x x d sx i sx j*

Intuitively, the minimum distance measures the amount of noise that can be tolerated by the

A low-complexity realization of QIM called dither modulation which is better than both linear methods of spread spectrum and nonlinear methods of low-bit modulation against square-error distortion constrained intentional attacks. Dither modulation (DM) is the simplest form of quantization index modulation and is the most thoroughly analyzed by its ease of practical implementation. Dither modulation systems embed watermark by modulating the amount of the shift, which is called the dither vector, by the embedded signal. The host signal is quantized with the resulting dithered quantizer to form the composite signal. Dithered quantization (or Dither Modulation) is an operation in which a

min min ( ; ) ( ; )

min ( , ): ( , )

Fig. 10. QIM Scheme

Fig. 11. Quantization Index Modulation

system.

**b. Dither modulation** 

'

#### **iii. Spread spectrum**

Spread spectrum watermarking is one of the most popular methods of watermarking. In this technique, the watermark bits are randomly scattered in the cover object. This not only ensures that the watermark is robust to attacks but also simplifies the detection algorithm using correlation analysis. Cryptographers believe that spread spectrum (SS) method of watermarking can incorporate a high degree of robustness because the pseudo-random sequences being used in SS watermarking are very difficult to generate without the prior knowledge of the initial state of the random number generator. This secures decoding or removal of the watermark and also provides resistance to cropping. The major drawback of the SS watermarking scheme is that it requires a high gain value , which sometimes tends to alter the cover data file considerably such that it is noticeable. To overcome this problem, the improved spread spectrum (ISS) technique is used. In this technique a feature vector extraction mechanism has been established which enhances the performance by modulating the energy of the inserted watermark to compensate for the signal interference. The ISS technique using the dither quantization is used to enhance the performance of the embedding procedure and improve the overall performance of the watermarking scheme.

Spread transform dither modulation method is a transform domain method. The transform methods are more complex, but more robust than the spatial methods. The watermark is inserted into the cover image in a spread-spectrum fashion in the spectral domain, thereby making it robust against signal processing operations. In this case, the feature vector extraction process can be seen as an extension of the spread transform technique (a more general method of spreading watermark information over a host signal than spread spectrum) that is frequently employed on multimedia. To this feature vector a quantization based watermarking algorithm is used. Quantization index modulation (QIM) methods are a class of watermarking methods that achieve provably good rate-distortion-robustness performance.

#### **a. Quantization index modulation**

The process of mapping a large possible infinite set of values to a much smaller set is called quantization. Since quantization reduces the number of distinct symbols that have to be coded, it is central to many lossy compression schemes. A quantizer consists of two mappings: an encoder mapping and a decoder mapping. The encoder divides the range of source values into a number of intervals. Each interval is represented by a codeword. The encoder represents all the source values that fall into a particular interval by the codeword assigned to that interval. As there could be many possibly infinitely many distinct samples that can fall in any given interval, the encoder mapping is irreversible. For every codeword generated by the encoder, the decoder generates a reconstruction value.

Quantizers, or a sequence of quantizers, can be used to as appropriate-identity functions to embed the watermark information. The number of possible values of m determines the number of required quantizers, m acts as an index that selects the quantizer that is used to represent m. For the case m = 2 we have a binary quantizer. The following figure illustrates the QIM information embedding process. To embed one bit m, m € 0, 1 and image pixel is mapped to the nearest reconstruction point representing the information of m. The minimum distance d min between the sets of reconstruction points of different quantizers in the ensemble determines the robustness of the embedding,

Spread spectrum watermarking is one of the most popular methods of watermarking. In this technique, the watermark bits are randomly scattered in the cover object. This not only ensures that the watermark is robust to attacks but also simplifies the detection algorithm using correlation analysis. Cryptographers believe that spread spectrum (SS) method of watermarking can incorporate a high degree of robustness because the pseudo-random sequences being used in SS watermarking are very difficult to generate without the prior knowledge of the initial state of the random number generator. This secures decoding or removal of the watermark and also provides resistance to cropping. The major drawback of the SS watermarking scheme is that it requires a high gain value , which sometimes tends to alter the cover data file considerably such that it is noticeable. To overcome this problem, the improved spread spectrum (ISS) technique is used. In this technique a feature vector extraction mechanism has been established which enhances the performance by modulating the energy of the inserted watermark to compensate for the signal interference. The ISS technique using the dither quantization is used to enhance the performance of the embedding procedure and improve the overall performance of the watermarking scheme. Spread transform dither modulation method is a transform domain method. The transform methods are more complex, but more robust than the spatial methods. The watermark is inserted into the cover image in a spread-spectrum fashion in the spectral domain, thereby making it robust against signal processing operations. In this case, the feature vector extraction process can be seen as an extension of the spread transform technique (a more general method of spreading watermark information over a host signal than spread spectrum) that is frequently employed on multimedia. To this feature vector a quantization based watermarking algorithm is used. Quantization index modulation (QIM) methods are a class of watermarking methods that achieve provably good rate-distortion-robustness

The process of mapping a large possible infinite set of values to a much smaller set is called quantization. Since quantization reduces the number of distinct symbols that have to be coded, it is central to many lossy compression schemes. A quantizer consists of two mappings: an encoder mapping and a decoder mapping. The encoder divides the range of source values into a number of intervals. Each interval is represented by a codeword. The encoder represents all the source values that fall into a particular interval by the codeword assigned to that interval. As there could be many possibly infinitely many distinct samples that can fall in any given interval, the encoder mapping is irreversible. For every codeword

Quantizers, or a sequence of quantizers, can be used to as appropriate-identity functions to embed the watermark information. The number of possible values of m determines the number of required quantizers, m acts as an index that selects the quantizer that is used to represent m. For the case m = 2 we have a binary quantizer. The following figure illustrates the QIM information embedding process. To embed one bit m, m € 0, 1 and image pixel is mapped to the nearest reconstruction point representing the information of m. The minimum distance d min between the sets of reconstruction points of different quantizers in

generated by the encoder, the decoder generates a reconstruction value.

the ensemble determines the robustness of the embedding,

**iii. Spread spectrum** 

performance.

**a. Quantization index modulation** 

min ( , ): ( , ) min min ( ; ) ( ; ) *i j i j ij i j x x d sx i sx j* ' 

Fig. 11. Quantization Index Modulation

Intuitively, the minimum distance measures the amount of noise that can be tolerated by the system.

#### **b. Dither modulation**

A low-complexity realization of QIM called dither modulation which is better than both linear methods of spread spectrum and nonlinear methods of low-bit modulation against square-error distortion constrained intentional attacks. Dither modulation (DM) is the simplest form of quantization index modulation and is the most thoroughly analyzed by its ease of practical implementation. Dither modulation systems embed watermark by modulating the amount of the shift, which is called the dither vector, by the embedded signal. The host signal is quantized with the resulting dithered quantizer to form the composite signal. Dithered quantization (or Dither Modulation) is an operation in which a

Robust Multiple Image Watermarking Based on Spread Transform 59

bit be 0 (using the finer quantizer).To embed 1, the coefficient (1.4) is changed to the nearest odd multiple of (1). For the second bit, the coefficient is decreased/ increased by /4 to

Although it is now well-accepted that binning methods (QIM) are better suited for highcapacity hiding, SS techniques continue to receive a lot of attention because of their perceived advantage for achieving robustness. QIM-based schemes provide robustness against several attacks while embedding large number of bits. The subtractive dither quantization error (SDQE) does not depend on the quantizer input when the dither signal d has a uniform distribution within the range of one quantization bin (di € [-/2, /2]),

Spread transform (also called projection) makes the embedding distortion concentrating on one coefficient spread to multiple coefficients. This leads to some advantages, such as the satisfaction of peak distortion limitations. This section presents a multiple watermarking method based on spread transform, in which cover vectors extracted from the cover works are projected to multiple orthogonal projection vectors. Then different watermark signals are embedded in different orientations of these orthogonal projection vectors. The embedding and extracting methods are introduced, and its performances are analyzed.

The above discussion suggests the following general procedure for embedding multiple

**2.** Extract the cover vectors from the cover image by first dividing the image into blocks of

**3.** Choose L projection vectors to hide L different watermark signals such that number of

**4.** Embed different watermarks into corresponding projected data using dither

The proposed algorithm pseudo randomly selects 88 DCT coefficient blocks which are orthogonal to each other. These blocks are considered as a vector and the condition of

embed 0/1 respectively. To embed 0, the coefficient is changed from 1 to 0.75.

leading to an expected squared error e2 = 2/12.

**i. Watermark embedding process** 

**1.** Read the input image to be watermarked.

8×8 pixels and compute DCT for each block.

projection vectors remains orthogonal to each other.

The mark is a Watermark sequence of binary values, wi € 0, 1.

watermarks into the same image.

modulation.

**Coefficient selection** 

Fig. 13. Watermark Embedding

orthogonality is V1. V2T =0.

**c. Spread transform** 

dither vector d of length L is added to the input x prior to quantization. The output of the subtractive quantization operation is denoted by

$$\mathbf{v}\_{\mathbf{S}\mathbf{i}} = \mathbf{Q}(\mathbf{x}\_{\mathbf{i}} + \mathbf{d}\_{\mathbf{i}}) - \mathbf{d}\_{\mathbf{i}} \mathbf{j} \ 0 \le \mathbf{i} \le \mathbf{L}$$

Or, using the notation introduced above,

$$\mathbf{s}(\mathbf{x}; \mathbf{m}) = \mathbf{Q}(\mathbf{x} + \mathbf{d}(\mathbf{m})) - \mathbf{d}(\mathbf{m})$$

For our discussion, we only consider uniform, scalar quantizer with a step size M. The binary dither ensemble can be generated pseudo-randomly by choosing di with a uniform distribution over [–/2; +/2] and assigning di as follows:

$$d\_i \text{ (2)} = \begin{cases} d\_i(1) + \frac{\Delta}{2}, & \text{if } d\_i(1) < 0 \\\ d\_i(1) - \frac{\Delta}{2}, & \text{if } d\_i(1) \ge 0 \end{cases}$$

Where, 0 i < L. For the single embedding case (Figure12 (a)), let the QIM embed ding logic be converting an element to the nearest even/odd multiple of the quantization interval, , to embed 0/1, respectively.

Fig. 12. QIM based Information hiding for single and double embedding

For hiding, we use quantized discrete cosine transform (DCT) coefficients. For perceptual transparency, we do not modify coefficients that are too close to zero; hence, all coefficients in the range [-0.5, 0.5] are mapped to zero and are regarded as erasures.

The two quantizers used for double embedding (Figure 12(b)) have quantization intervals of and /2, respectively. In the example (Figure 12(b)), = 1 and the DCT coefficient (P) equals 1.4. Let the first bit to be embedded be 1 (using the coarser quantizer) and the second bit be 0 (using the finer quantizer).To embed 1, the coefficient (1.4) is changed to the nearest odd multiple of (1). For the second bit, the coefficient is decreased/ increased by /4 to embed 0/1 respectively. To embed 0, the coefficient is changed from 1 to 0.75.

Although it is now well-accepted that binning methods (QIM) are better suited for highcapacity hiding, SS techniques continue to receive a lot of attention because of their perceived advantage for achieving robustness. QIM-based schemes provide robustness against several attacks while embedding large number of bits. The subtractive dither quantization error (SDQE) does not depend on the quantizer input when the dither signal d has a uniform distribution within the range of one quantization bin (di € [-/2, /2]), leading to an expected squared error e2 = 2/12.

#### **c. Spread transform**

58 Watermarking – Volume 2

dither vector d of length L is added to the input x prior to quantization. The output of the

s(x; m) = Q(x + d(m)) – d(m) For our discussion, we only consider uniform, scalar quantizer with a step size M. The binary dither ensemble can be generated pseudo-randomly by choosing di with a uniform

(1) , (1) 0 <sup>2</sup>

<sup>&</sup>gt; <sup>6</sup> <sup>6</sup> <sup>6</sup> <sup>6</sup> <sup>4</sup> ?

*i i*

*d if d*

3

*i i*

*d if d*

be converting an element to the nearest even/odd multiple of the quantization interval, , to

(1) , (1) 0 <sup>2</sup>

i < L. For the single embedding case (Figure12 (a)), let the QIM embed ding logic

<sup>6</sup> \$ <sup>6</sup> 65 @6

i < L

si = Q(xi + di) – di; 0 -

subtractive quantization operation is denoted by

distribution over [–/2; +/2] and assigning di as follows:

*di* (2) =

Fig. 12. QIM based Information hiding for single and double embedding

in the range [-0.5, 0.5] are mapped to zero and are regarded as erasures.

For hiding, we use quantized discrete cosine transform (DCT) coefficients. For perceptual transparency, we do not modify coefficients that are too close to zero; hence, all coefficients

The two quantizers used for double embedding (Figure 12(b)) have quantization intervals of and /2, respectively. In the example (Figure 12(b)), = 1 and the DCT coefficient (P) equals 1.4. Let the first bit to be embedded be 1 (using the coarser quantizer) and the second

Or, using the notation introduced above,

Where, 0 -

embed 0/1, respectively.

Spread transform (also called projection) makes the embedding distortion concentrating on one coefficient spread to multiple coefficients. This leads to some advantages, such as the satisfaction of peak distortion limitations. This section presents a multiple watermarking method based on spread transform, in which cover vectors extracted from the cover works are projected to multiple orthogonal projection vectors. Then different watermark signals are embedded in different orientations of these orthogonal projection vectors. The embedding and extracting methods are introduced, and its performances are analyzed.

#### **i. Watermark embedding process**

The above discussion suggests the following general procedure for embedding multiple watermarks into the same image.


The mark is a Watermark sequence of binary values, wi € 0, 1.

#### **Coefficient selection**

Fig. 13. Watermark Embedding

The proposed algorithm pseudo randomly selects 88 DCT coefficient blocks which are orthogonal to each other. These blocks are considered as a vector and the condition of orthogonality is V1. V2T =0.

Robust Multiple Image Watermarking Based on Spread Transform 61

fb (m2; n2) are the selected coefficients within that block. The absolute difference between the

b= fb(m1; n1) – fb(m2; n2) In order to embed one bit of watermark information, wi, in the selected block bi, the coefficient pair fb(m1; n1); fb(m2; n2) is modified such that the distance becomes where q is a

b= , 0

q(x) = round(x/) × Where is the quantization step used to control the embedding distortion and each coefficients quantization step can differ from each other, d[m] is the dither value

In watermark detection process the embedded watermark signals are extracted using corresponding extraction method and compared with the original watermarked data. Extraction method depends on the embedding method used. The watermark extracting process is the reverse process of the watermark embedding process. Minimum distance decoder is used to extract the watermark which is similar to STDM algorithm. The detailed

*q if w q if w* <sup>3</sup> <sup>4</sup> \$ 5 In this proposed method the two watermarks are embedded using DM method with uniform, scalar quantizer of step size , where is the quantization step used to control the embedding distortion. This method is called double spread transform dither modulation (DSTDM). Figure 14 shows the realization of DM, where, x0 is the original data, xw is the

, 1 *i i*

Table 3. JPEG standard quantization matrix for quality factor 90

watermarked data and q (·) is the basic quantizer function, that is

selected coefficients is given by:

parameter controlling the embedding strength.

corresponding to the watermark information m.

Fig. 14. Watermark Embedding Process of DM

extracting method of DSTDM is following:

**iv. Watermark extraction method** 

For embedding firstly, each block is quantized using to the JPEG quantization matrix and a quantization factor Q. Quantization is defined as division of each DCT coefficient by its corresponding quantizer step size, followed by rounding to the nearest integer. In this step the less important DCT coefficients are wiped out. This (lossy) transformation is done by dividing each of the coefficients in the 8x8 DCT matrices by a weight taken from a quantization table. If all the weights are equal, the transformation does nothing but if they increase sharply from origin, higher spatial frequencies are dropped quickly. Most existing compressors start from a sample table developed by the ISO JPEG committee. Subjective experiments involving the human visual system have resulted in the JPEG standard quantization matrix. With a quality level of 50, the matrix renders both high compression and excellent decompressed image quality. If however, another level of quality and compression is desired, scalar multiplies of the JPEG Standard quantization matrix (QM) may be used


Table 1. JPEG standard quantization matrix for quality factor (QF) =50

For a quality level greater than 50 (less compression and higher image quality), the standard QM is multiplied by (100-quality level)/50. For a quality less than 50 (more compression, lower image quality), the standard QM is multiplied by 50/quality level. The scaled QM is then rounded and clipped to have positive integer values ranging from 1 to 255. For example, the following QM yields quality levels of 10 and 90.

Then, let fb denote an 8×8 DCT coefficient block and fb(m1; n1),


Table 2. JPEG standard quantization matrix for quality factor 10

For embedding firstly, each block is quantized using to the JPEG quantization matrix and a quantization factor Q. Quantization is defined as division of each DCT coefficient by its corresponding quantizer step size, followed by rounding to the nearest integer. In this step the less important DCT coefficients are wiped out. This (lossy) transformation is done by dividing each of the coefficients in the 8x8 DCT matrices by a weight taken from a quantization table. If all the weights are equal, the transformation does nothing but if they increase sharply from origin, higher spatial frequencies are dropped quickly. Most existing compressors start from a sample table developed by the ISO JPEG committee. Subjective experiments involving the human visual system have resulted in the JPEG standard quantization matrix. With a quality level of 50, the matrix renders both high compression and excellent decompressed image quality. If however, another level of quality and compression is desired, scalar multiplies of

the JPEG Standard quantization matrix (QM) may be used

Table 1. JPEG standard quantization matrix for quality factor (QF) =50

example, the following QM yields quality levels of 10 and 90. Then, let fb denote an 8×8 DCT coefficient block and fb(m1; n1),

Table 2. JPEG standard quantization matrix for quality factor 10

For a quality level greater than 50 (less compression and higher image quality), the standard QM is multiplied by (100-quality level)/50. For a quality less than 50 (more compression, lower image quality), the standard QM is multiplied by 50/quality level. The scaled QM is then rounded and clipped to have positive integer values ranging from 1 to 255. For


Table 3. JPEG standard quantization matrix for quality factor 90

fb (m2; n2) are the selected coefficients within that block. The absolute difference between the selected coefficients is given by:

$$
\Delta\_{\mathsf{b}} = \mathsf{f}\_{\mathsf{b}}(\mathsf{m}\_{\mathsf{l}};\mathsf{n}\_{\mathsf{l}}) - \mathsf{f}\_{\mathsf{b}}(\mathsf{m}\_{\mathsf{l}};\mathsf{n}\_{\mathsf{l}})
$$

In order to embed one bit of watermark information, wi, in the selected block bi, the coefficient pair fb(m1; n1); fb(m2; n2) is modified such that the distance becomes where q is a parameter controlling the embedding strength.

$$\Delta\_{\mathsf{b}} = \begin{cases} \le q\_{\prime\prime} \text{ if } \; w\_i = 0\\ \ge q\_{\prime\prime} \text{ if } \; w\_i = 1 \end{cases}$$

In this proposed method the two watermarks are embedded using DM method with uniform, scalar quantizer of step size , where is the quantization step used to control the embedding distortion. This method is called double spread transform dither modulation (DSTDM). Figure 14 shows the realization of DM, where, x0 is the original data, xw is the watermarked data and q (·) is the basic quantizer function, that is

$$\mathbf{q}\_{\Delta}(\mathbf{x}) = \text{round}(\mathbf{x}/\Delta) \times \Delta$$

Where is the quantization step used to control the embedding distortion and each coefficients quantization step can differ from each other, d[m] is the dither value corresponding to the watermark information m.

Fig. 14. Watermark Embedding Process of DM

#### **iv. Watermark extraction method**

In watermark detection process the embedded watermark signals are extracted using corresponding extraction method and compared with the original watermarked data. Extraction method depends on the embedding method used. The watermark extracting process is the reverse process of the watermark embedding process. Minimum distance decoder is used to extract the watermark which is similar to STDM algorithm. The detailed extracting method of DSTDM is following:

Robust Multiple Image Watermarking Based on Spread Transform 63

The PSNR computes the peak signal-to-noise ratio, in decibels, between two images. This ratio is often used as a quality measurement between the original and a compressed image. The higher the PSNR, the better is the quality of the compressed or reconstructed image. The Mean Square Error (MSE) and the Peak Signal to Noise Ratio (PSNR) are the two error metrics used to compare image compression quality. The MSE represents the cumulative squared error between the compressed and the original image, whereas PSNR represents a measure of the peak error. The lower the value of MSE, the lower is the error. To compute

1 2 ,

% ,

M and N are the number of rows and columns in the input images, respectively. The PSNR

R is the maximum fluctuation in the input image data type. For example, if the input image has a double-precision floating-point data type, then R is 1. If it has an 8-bit unsigned integer data type, R is 255, etc. Logically, a higher value of PSNR is good because it means that the ratio of Signal to Noise is higher. Here, the 'signal' is the original image and the 'noise' is the error in reconstruction. So, if you find a compression scheme having a lower MSE (and a high PSNR), you can recognize that it is a better one. Usually PSNR of more than 35 dB is

Compare the difference between the original binary watermark w and the extracted binary

BER = ' *XOR w w* (, ) *L*

The method presented provides effective balance between robustness, complexity, and image quality. Multiple watermark signals are embedded in different orientations of the cover vectors extracted from the cover works, so that different watermark signals will not mutually interfere. Comparing with other relative watermarking techniques, this method yields significant improvements in invisibility and robustness. The proposed method is very flexible and its mathematical background is very clear. Experimental results also show that the presented method can avoid the interference of one watermark signal with another very well, which is one of the most important and difficult problems for a multiple watermarking

[I ( , ) I ( , )]

*mn m n M N* 

2

*MSE*

<sup>10</sup> log *<sup>R</sup>*

2

the PSNR, first calculates the mean-squared error using the following equation:

*M N*

PSNR = 10

MSE=

watermark w, and this equals to computing the bit error ratio (BER):

Where L is length of the binary bit stream of watermark.

algorithm and its achieved validity can be 100%.

**4.1 PSNR (Peak Signal to Noise Ratio)** 

is given by the following equation:

considered good quality.

**4.2 Bit error ratio** 

**5. Summary** 


Fig. 15. Watermark extraction

The minimum distance decoding rule is

mi = argh€0 min |Wvi [h] –Wvi| i €1,2

Where Wvi [0] and Wvi [1] represent dither modulation result of Wvi using d[0] and d[1] as dither value , Vi is the projection vector and mi is the ith extracted watermark signal. During watermark extraction phase, the elements of the signal received at the decoder are quantized using each dither quantizer. The received message is reconstructed from the indices of the sequence of quantizers which contain the reconstruction points closest to the elements. The decoder extracts the embedded information mi based on dither modulation result Wvi. It is well known that due to insertion of watermark, there will be degradation in visual quality of the host image (cover image). The degree of deterioration depends on the size of watermark embedded as well as step size used for DM. To achieve that goal, watermark bits are detected using minimum distance decoder and the remaining self-noise due to watermark embedding is suppressed to provide better quality of image. In case of more than two watermark signals, DSTDM can be generalized to multiple spread transform dither modulation (MSTDM). In this situation, the cover vector extracted from the cover work using Rule 1 is projected to multiple (for example, M) projection vectors Vi (i2 1,2,...,M) orthogonal to each other. Then different watermark signals are embedded using DM in different directions, respectively. The extracting method of MSTDM is similar to that of DSTDM.

#### **4. Statistical measures of image robustness**

Performance of embedding technique is decided based on some numerical identities such as quality of reconstructed image and extracted information similarity. These are measured with PSNR and bit error ratio respectively.

#### **4.1 PSNR (Peak Signal to Noise Ratio)**

62 Watermarking – Volume 2

**1.** Extract the cover vectors by computing DCT in the blocks of 8 × 8 pixels of

**2.** Project the cover vectors to the same projection vectors used in the embedding process.

**4.** Apply minimum distance decoding rule into the corresponding dither value received

mi = argh€0 min |Wvi [h] –Wvi| i €1,2 Where Wvi [0] and Wvi [1] represent dither modulation result of Wvi using d[0] and d[1] as dither value , Vi is the projection vector and mi is the ith extracted watermark signal. During watermark extraction phase, the elements of the signal received at the decoder are quantized using each dither quantizer. The received message is reconstructed from the indices of the sequence of quantizers which contain the reconstruction points closest to the elements. The decoder extracts the embedded information mi based on dither modulation result Wvi. It is well known that due to insertion of watermark, there will be degradation in visual quality of the host image (cover image). The degree of deterioration depends on the size of watermark embedded as well as step size used for DM. To achieve that goal, watermark bits are detected using minimum distance decoder and the remaining self-noise due to watermark embedding is suppressed to provide better quality of image. In case of more than two watermark signals, DSTDM can be generalized to multiple spread transform dither modulation (MSTDM). In this situation, the cover vector extracted from the cover work using Rule 1 is projected to multiple (for example, M) projection vectors Vi (i2 1,2,...,M) orthogonal to each other. Then different watermark signals are embedded using DM in different directions, respectively. The extracting

Performance of embedding technique is decided based on some numerical identities such as quality of reconstructed image and extracted information similarity. These are measured

watermarked image.

by dither modulation.

Fig. 15. Watermark extraction

The minimum distance decoding rule is

method of MSTDM is similar to that of DSTDM.

with PSNR and bit error ratio respectively.

**4. Statistical measures of image robustness** 

**3.** Apply DM with the same quantization step M.

The PSNR computes the peak signal-to-noise ratio, in decibels, between two images. This ratio is often used as a quality measurement between the original and a compressed image. The higher the PSNR, the better is the quality of the compressed or reconstructed image. The Mean Square Error (MSE) and the Peak Signal to Noise Ratio (PSNR) are the two error metrics used to compare image compression quality. The MSE represents the cumulative squared error between the compressed and the original image, whereas PSNR represents a measure of the peak error. The lower the value of MSE, the lower is the error. To compute the PSNR, first calculates the mean-squared error using the following equation:

$$\text{MSE} = \sum\_{M,N} \frac{[\text{I}\_1(m,n) - \text{I}\_2(m,n)]^2}{M \times N}$$

M and N are the number of rows and columns in the input images, respectively. The PSNR is given by the following equation:

$$\text{PSNR} = 10 \log\_{10} \left| \frac{R^2}{MSE} \right| $$

R is the maximum fluctuation in the input image data type. For example, if the input image has a double-precision floating-point data type, then R is 1. If it has an 8-bit unsigned integer data type, R is 255, etc. Logically, a higher value of PSNR is good because it means that the ratio of Signal to Noise is higher. Here, the 'signal' is the original image and the 'noise' is the error in reconstruction. So, if you find a compression scheme having a lower MSE (and a high PSNR), you can recognize that it is a better one. Usually PSNR of more than 35 dB is considered good quality.

#### **4.2 Bit error ratio**

Compare the difference between the original binary watermark w and the extracted binary watermark w, and this equals to computing the bit error ratio (BER):

$$\text{BER} = \frac{\text{XOR}(w\_\prime w^\prime)}{L}$$

Where L is length of the binary bit stream of watermark.

#### **5. Summary**

The method presented provides effective balance between robustness, complexity, and image quality. Multiple watermark signals are embedded in different orientations of the cover vectors extracted from the cover works, so that different watermark signals will not mutually interfere. Comparing with other relative watermarking techniques, this method yields significant improvements in invisibility and robustness. The proposed method is very flexible and its mathematical background is very clear. Experimental results also show that the presented method can avoid the interference of one watermark signal with another very well, which is one of the most important and difficult problems for a multiple watermarking algorithm and its achieved validity can be 100%.

**4** 

*Surat* 

*Nagpur India* 

**Real Time Implementation of** 

**Image and Video Application** 

*1Sardar Vallabhbhai National Institute of Technology* 

*2Visvesvaraya National Institute of Technology* 

Amit Joshi1, Vivekanand Mishra1

and R. M. Patrikar2

**Digital Watermarking Algorithm for** 

Watermarking is the process of hiding a predefined pattern or logo into multimedia like image, audio or video in a way that quality and imperceptibility of media is preserved. Predefined pattern or logo represents identity of an author or rights. In recent years, rapid growth in digital multimedia has been noticed. Digital data (image, audio, and video) is sent through World Wide Web (www) without much effort and money. But security is the main issue in digital multimedia. In the face of these dramatic changes, the entertainment industry has scrambled to adopt a slew of technologies that allow it to retain the copyright controls provided by the law and harness the new world to increase the industry size and

In recent years, the research community has seen much activity in the area of digital watermarking as an additional tool in protecting digital content and many excellent papers have appeared over the years (Arun Kejariwal,2003). Digital watermarking attempts to copyright the digital data that is freely available on the World Wide Web to protect the owner's rights. As opposed to traditional, printed watermarks, digital watermarks are transparent signatures. They are integrated within digital files as noise, or random information that already exists in the file. Thus, the detection and removal of the watermark becomes more difficult. Typically, watermarks are dispersed throughout the entire digital file such that the manipulation of one portion of the file does not alter the underlying watermark. To provide copy protection and copyright protection for digital image and video data, two complementary techniques are being developed known as Encryption and Watermarking. One more method for data hiding is which is closely correlated with watermarking known as Steganography.Steganography was basically a way of transmitting hidden (secret) messages between allies. There are various data hiding techqniques are available for security. The deatils of each data hiding techniques are presented in next

**1. Introduction**

section.

enhance the consumer experience.

#### **6. References**

