2.1 1D discrete Fourier transform

When designing a robust and dependable embedding system, security concerns always come to the forefront. Hiding the same watermark in a repetitive manner to each and every frame of the host video may cause a problem of maintaining the

## DWT-Based Data Hiding Technique for Videos Ownership Protection DOI: http://dx.doi.org/10.5772/intechopen.84963

statistical invisibility, which is an important condition of every security system [15]. Moreover, applying independent watermarks to each and every of these frames also causes a security problem if these frames have few or no motion areas inside them; these motionless regions in successive video frames may be statistically compared or averaged to remove independent watermarks. Attacks of such kinds are normally called collusion attacks. The inter-frame collusion attacks, for instance, exploit the repetition in the video frames and their scenes or in the watermarks themselves to produce a false copy of the video that does not have any watermarks; these attacks can be divided into watermark estimation remodulation (WER) attack and frame temporal filtering (FTF) attack [16]. Classifying the video frames according to the amount of motion in them is useful in this regard. The motion in videos is a relative one, since most of the videos have motion, but what interest us here are the amount of this motion, how fast this motion is, the relative motion with respect to the surroundings, and the distribution of this motion across the frames. Most of the video compression techniques use inter-frame motion estimations to encode the frames; however, there are other methods that can be used to detect static and dynamic scenes in videos. One method can be built depending on the 1D discrete Fourier transform (DFT). The 1D DFT in temporal direction performs a transformation process of a group of pictures (GOPs) into a temporal frequency domain; in the resulting domain, both the video frames spatial and temporal frequency information exist in the same resulting frame. Higher frequencies are a reflection of the fast motion from one frame to other frames [17]. The 1D DFT of a video f(x,y,t) that has a specific size of MxNxT, in which MxN is the size of each of the video frames and T is the number of the video frames that are grouped in one GOP, is given by

$$F(\boldsymbol{u}, \boldsymbol{v}, \tau) = \sum\_{t=0}^{T-1} f(\boldsymbol{x}, \boldsymbol{y}, t) e^{-j2\Pi(t\tau/T)} \tag{3}$$

where u and v represent the spatial domain of the video frames, while τ represents the temporal domain of these frames. Normally the GOPs are taken as five frames or a close number. Depending on that, a group of the so-called spatiotemporal frames can be constructed for the Foreman video. Twenty-five frames of the Foreman video were transformed using this method of the 1D DFT, and since the DFT is a symmetric process in one GOP, so it is logical to show only the first spatiotemporal frame of each of those groups of pictures. Figure 3 shows the first frame of the Foreman video, while Figure 4 shows the 5 temporal frames of this

Figure 3. The first frame of the Foreman video.

who is the sole owner. Furthermore, the security of the system depends partially as well on the degree of the randomness of the pseudorandom sequence that is used in the encoding process. On the other hand, the Y components of the color space were chosen intentionally because they have higher resolution and therefore higher hiding capacity, but we have to keep in mind the fact that the U and V components likewise can be used. As we mentioned in the introduction, our techniques will be used when the HEVC process is applied; Figure 2 shows the proposed hiding process when the HEVC or H.265 process is applied to the video that is

The block diagram of the watermarking process with the application of HEVC process.

When designing a robust and dependable embedding system, security concerns always come to the forefront. Hiding the same watermark in a repetitive manner to each and every frame of the host video may cause a problem of maintaining the

watermarked.

60

Figure 1.

Figure 2.

2.1 1D discrete Fourier transform

The block diagram of the proposed watermarking method.

Wavelet Transform and Complexity

place such as the lossy compression, the additive noise, and the geometrical operations; this, in turn, renders the detection process a challenging one. Furthermore, security concerns arise as a critical point; this is reflected in the attempts of attackers to know or destroy the hidden watermark. To cope with these difficulties,

<sup>R</sup> <sup>¼</sup> <sup>∑</sup>m∑<sup>n</sup> Amn � <sup>A</sup> � � Bmn � <sup>B</sup> � �

geometrical, statistical, additive noise, etc. Hence, the watermarks that are extracted can be talked about as noisy versions of the originally hidden ones or, equivalently, noisy signals. The cross-correlation test gives good indication of the similarity between two signals, and this can be applied to our extracted watermarks which are expected to have some sort of similarity. Depending on this statistical

analysis, it is possible to establish a new set of extracted watermarks W1.

where A and B are the means of A and B, respectively. The attacks that the videos are subjected to are of different natures and scopes; they can be divided into

Depending on the resulting cross-correlation value, it is possible to get the set of coefficients that can be used in our final decision-making process. Hence, the final watermark set W<sup>1</sup> can be established. On the other side, if the correlation value was low, that means that the coefficients are so corrupted, and therefore they will be excluded from our final set. This cross-correlation process can be seen in Figure 5 where Figure 5(a) shows a plot of the cross-correlation matrix between two sets of coefficients that are highly correlated, and that means they can be included in our final set, while Figure 5(b) shows the opposite of that, where these coefficients are corrupted. To establish a good estimation process, a threshold value should be defined for the cross-correlation value, and the decision can be done accordingly. Since the cross-correlation between binary images is a measure of similarity between these images, this tells us that flipping the value of any pixel will reduce this similarity. If wi ∈W, then a cross-correlation process is performed between wi

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi <sup>∑</sup>m∑<sup>n</sup> Amn � <sup>A</sup> � �<sup>2</sup> � � <sup>∑</sup>m∑<sup>n</sup> Bmn � <sup>B</sup> � �<sup>2</sup> r � �

(4)

A noise elimination method can be developed to estimate the original pixels; to do that, the extracted coefficients can be smoothened using a spatial convolution mask of size 5x5. In fact the 5x5 mask gave higher performance than the 3x3 mask when our videos were subjected to noise and compression. Moreover, the selective denoising filter which is presented in Section 4 gave good results in removing the noise and smoothening the extracted image. Using a subtraction operation as opposed to the additional one in Section 2 and setting the positive and negative values to +1 and �1, respectively, a coarse version of the watermark can be extracted. The enhanced detection process is then set to use multiple extracted watermarks, which were embedded randomly in different video frames in the first place, for our final estimation process. It was shown in the previous section that either the same watermark or multiple watermarks can be used depending on the changes in the scenes; moreover it was shown that the 1D DFT is helpful in determining these changes. This means that we are not sacrificing the security when using the same watermark in a random manner; on the contrary we are increasing robustness against detection or manipulating attempts. Let us assume that the extracted watermarks are grouped in a set W ¼ w1, w2, …wn. To choose the set of watermarks that can be used in the final estimation process, cross-correlation test can be performed between every two extracted watermarks wi and wj. The normalized cross-correlation coefficient between two matrices A and B is given according

an enhanced detection and estimation process is developed.

DWT-Based Data Hiding Technique for Videos Ownership Protection

DOI: http://dx.doi.org/10.5772/intechopen.84963

to the following equation:

63

Figure 4. The 1D DFT of 5 GOPs of 25 frames of Foreman video and their corresponding norms.

video that were evaluated from the original 25 frames; their norms are shown as well. The edges that are seen in these frames correspond to high frequencies which reflect the motion in temporal domain and how this motion in each frame is distributed; furthermore, the values of the evaluated norms reflect how much and how fast the motion in each GOP is. For instance, the intensity of the edges shows the movement of the head and the relative motion with respect to the building; it can be seen that the background has some motion that corresponds to a moving camera which is exactly the case here.

Depending on the previous analysis of the videos using the 1D DFT and the classification of the video frames into dynamic and static frames, a significant enhancement can be added to the hiding process in terms of both security and reliability. Using this analysis, different binary watermarks will be embedded in motion frames, and the same binary watermark will be embedded in motionless ones. In fact, since we need to have some repetition of the watermark to enhance the detection process, this method helps us without weakening our algorithm due to statistical estimation methods that are used in steganalysis, for instance; moreover repeating the watermark in motionless frames increases the cohesion of the watermarked video sequence. Furthermore, the bands being used are not confined to the high-frequency ones; the effect of averaging and collusion attacks is reduced as well. Using 1D DFT to establish motion information is not the only way that can be used; 3D DWT, for instance, can be used to construct spatiotemporal components of videos frames. Choosing the proper method to determine motion in frames depends in the first place on the application and other elements such as computational complexity. Since we are only looking for a method to estimate motion but not in a strict and precise way, using 1D DFT meets our needs at this stage.

## 3. Watermark detection process

The extraction process depends mainly on the hiding process, and so we are performing a reverse process. This is a blind watermarking method; hence, knowing the original watermark image is not a requirement, but, still, knowing the reconstruction synthesis filter banks and the generated pseudorandom sequence is required to extract our hidden watermark. To get the hidden watermark, a prediction and estimation process of the original values of the pixels is required [14]. This process should also take into account that different types of processing will take

### DWT-Based Data Hiding Technique for Videos Ownership Protection DOI: http://dx.doi.org/10.5772/intechopen.84963

place such as the lossy compression, the additive noise, and the geometrical operations; this, in turn, renders the detection process a challenging one. Furthermore, security concerns arise as a critical point; this is reflected in the attempts of attackers to know or destroy the hidden watermark. To cope with these difficulties, an enhanced detection and estimation process is developed.

A noise elimination method can be developed to estimate the original pixels; to do that, the extracted coefficients can be smoothened using a spatial convolution mask of size 5x5. In fact the 5x5 mask gave higher performance than the 3x3 mask when our videos were subjected to noise and compression. Moreover, the selective denoising filter which is presented in Section 4 gave good results in removing the noise and smoothening the extracted image. Using a subtraction operation as opposed to the additional one in Section 2 and setting the positive and negative values to +1 and �1, respectively, a coarse version of the watermark can be extracted. The enhanced detection process is then set to use multiple extracted watermarks, which were embedded randomly in different video frames in the first place, for our final estimation process. It was shown in the previous section that either the same watermark or multiple watermarks can be used depending on the changes in the scenes; moreover it was shown that the 1D DFT is helpful in determining these changes. This means that we are not sacrificing the security when using the same watermark in a random manner; on the contrary we are increasing robustness against detection or manipulating attempts. Let us assume that the extracted watermarks are grouped in a set W ¼ w1, w2, …wn. To choose the set of watermarks that can be used in the final estimation process, cross-correlation test can be performed between every two extracted watermarks wi and wj. The normalized cross-correlation coefficient between two matrices A and B is given according to the following equation:

$$R = \frac{\sum\_{m} \sum\_{n} \left( A\_{mn} - \overline{A} \right) \left( B\_{mn} - \overline{B} \right)}{\sqrt{\left( \sum\_{m} \sum\_{n} \left( A\_{mn} - \overline{A} \right)^{2} \right) \left( \sum\_{m} \sum\_{n} \left( B\_{mn} - \overline{B} \right)^{2} \right)}} \tag{4}$$

where A and B are the means of A and B, respectively. The attacks that the videos are subjected to are of different natures and scopes; they can be divided into geometrical, statistical, additive noise, etc. Hence, the watermarks that are extracted can be talked about as noisy versions of the originally hidden ones or, equivalently, noisy signals. The cross-correlation test gives good indication of the similarity between two signals, and this can be applied to our extracted watermarks which are expected to have some sort of similarity. Depending on this statistical analysis, it is possible to establish a new set of extracted watermarks W1.

Depending on the resulting cross-correlation value, it is possible to get the set of coefficients that can be used in our final decision-making process. Hence, the final watermark set W<sup>1</sup> can be established. On the other side, if the correlation value was low, that means that the coefficients are so corrupted, and therefore they will be excluded from our final set. This cross-correlation process can be seen in Figure 5 where Figure 5(a) shows a plot of the cross-correlation matrix between two sets of coefficients that are highly correlated, and that means they can be included in our final set, while Figure 5(b) shows the opposite of that, where these coefficients are corrupted. To establish a good estimation process, a threshold value should be defined for the cross-correlation value, and the decision can be done accordingly.

Since the cross-correlation between binary images is a measure of similarity between these images, this tells us that flipping the value of any pixel will reduce this similarity. If wi ∈W, then a cross-correlation process is performed between wi

video that were evaluated from the original 25 frames; their norms are shown as well. The edges that are seen in these frames correspond to high frequencies which reflect the motion in temporal domain and how this motion in each frame is distributed; furthermore, the values of the evaluated norms reflect how much and how fast the motion in each GOP is. For instance, the intensity of the edges shows the movement of the head and the relative motion with respect to the building; it can be seen that the background has some motion that corresponds to a moving

The 1D DFT of 5 GOPs of 25 frames of Foreman video and their corresponding norms.

Depending on the previous analysis of the videos using the 1D DFT and the classification of the video frames into dynamic and static frames, a significant enhancement can be added to the hiding process in terms of both security and reliability. Using this analysis, different binary watermarks will be embedded in motion frames, and the same binary watermark will be embedded in motionless ones. In fact, since we need to have some repetition of the watermark to enhance the detection process, this method helps us without weakening our algorithm due to statistical estimation methods that are used in steganalysis, for instance; moreover repeating the watermark in motionless frames increases the cohesion of the watermarked video sequence. Furthermore, the bands being used are not confined to the high-frequency ones; the effect of averaging and collusion attacks is reduced as well. Using 1D DFT to establish motion information is not the only way that can be used; 3D DWT, for instance, can be used to construct spatiotemporal components of videos frames. Choosing the proper method to determine motion in frames depends in the first place on the application and other elements such as computational complexity. Since we are only looking for a method to estimate motion but not in a strict and precise way, using 1D DFT meets our needs at this stage.

The extraction process depends mainly on the hiding process, and so we are performing a reverse process. This is a blind watermarking method; hence, knowing the original watermark image is not a requirement, but, still, knowing the reconstruction synthesis filter banks and the generated pseudorandom sequence is required to extract our hidden watermark. To get the hidden watermark, a prediction and estimation process of the original values of the pixels is required [14]. This process should also take into account that different types of processing will take

camera which is exactly the case here.

Wavelet Transform and Complexity

Figure 4.

3. Watermark detection process

62

Figure 5.

3D plots of the cross-correlation matrices of two extracted watermarks.

Figure 6. Expected values of the input and output binary images.

and all the other extracted watermarks in the set; then the average cross-correlation parameter is evaluated. The same process is done for all the watermarks in the set. A set that includes each extracted watermark and its corresponding average correlation value is established. Then, by establishing a threshold value h for the average cross-correlations, the extracted watermarks that do not achieve the threshold test are excluded from the new set W1. The final extracted watermark we can be evaluated by performing an averaging process on the watermarks in the set W1, where

$$w\_{\varepsilon} = Ave\{W\_1\}\tag{5}$$

Assuming that we are using a specific binary watermark, then the input matrix

A is constant during our watermarking process. This means that in the above equation, the flipping probability of the pixels p is the sole variable. Moreover, by having a comparison between Eqs. (4) and (7), it can be seen that the correlation between the two matrices A and B is dependent on the flipping probability of the pixels; hence, the flipping effect is reduced somehow by doing the averaging process of the extracted watermarks. An enhanced version can be built as far as p is not equal to the value 0.5 which corresponds to a unity entropy value. To demonstrate this analogy, Figure 7 shows the changes in these parameters when a random binary watermark is subjected to Gaussian noise with zero mean and different variances; in this figure, the variance of noise is represented by the term density for the illustra-

DWT-Based Data Hiding Technique for Videos Ownership Protection

DOI: http://dx.doi.org/10.5772/intechopen.84963

One of the challenging aspects in video encoding and watermarking is the additive noise that results in distorted video streams. The nature of the additive noise depends primarily on the source of this noise. Not only the additive noise tends to distort the visual quality of the video in question, but it also has its

noticeable impacts on the watermarking process. One type of noises that is common in video processing techniques is the salt-and-pepper (S&P) noise. This type of noise could be added to the video frames during the transmission process when the

hardware-generated errors during the encoding and decoding processes. Removing the noise without disturbing the watermarking process on the one hand and preserving the visual qualities on the other hand is a challenging process. As far as the watermarking process is concerned, it is useful to check the effects of both the additive noise and the removal process on our data hiding process. Many methods were proposed to eliminate the noise or enhance the visual appearance of the images [18, 19]; these methods depend mainly on the idea of median filters. The normal median filters, for example, which are used to eliminate the salt-and-pepper noise in images, do in fact filter the whole image regardless of the presence or absence of the noise in a certain area. This process reduces the original resolution of the image to a great extent in such a way that the qualities of high-definition (HD) videos are lost. This means that our watermarking process would not achieve the visual quality

communication channels, in a sense, are noisy, or it could be a result of the

tion and clarification purposes.

The enhanced correlations vs. noise density.

Figure 7.

65

4. Noise removal selective filter

Doing an averaging process is attributed to the fact that binary sets follow specific statistical pattern. The correlation coefficient R between any two arbitrary matrices A and B is given in Equation 4; the mean value of a binary image A is at the same time the expected value of A or E(A). Assuming that at the input, the probability of 1 is p<sup>1</sup> and that the probability of flipping of the value is p as shown in Figure 6, then

$$
\overline{A} = E(A) = p\_1 \tag{6}
$$

Moreover, the probability of having 1 at the output B ¼ p<sup>1</sup> ∗ ð Þþ 1 � p 1 � p<sup>1</sup> ∗ p and by taking Equation 6 into consideration, this equation can be rewritten as

$$B = E(B) = E(A) + (1 - 2 \ast E(A)) \ast p \tag{7}$$

DWT-Based Data Hiding Technique for Videos Ownership Protection DOI: http://dx.doi.org/10.5772/intechopen.84963

Figure 7. The enhanced correlations vs. noise density.

Assuming that we are using a specific binary watermark, then the input matrix A is constant during our watermarking process. This means that in the above equation, the flipping probability of the pixels p is the sole variable. Moreover, by having a comparison between Eqs. (4) and (7), it can be seen that the correlation between the two matrices A and B is dependent on the flipping probability of the pixels; hence, the flipping effect is reduced somehow by doing the averaging process of the extracted watermarks. An enhanced version can be built as far as p is not equal to the value 0.5 which corresponds to a unity entropy value. To demonstrate this analogy, Figure 7 shows the changes in these parameters when a random binary watermark is subjected to Gaussian noise with zero mean and different variances; in this figure, the variance of noise is represented by the term density for the illustration and clarification purposes.
