**5. Results and discussions**

Although there is not a standard benchmark for document watermarking systems, we will present results for common concerns in watermarking electronic documents such as watermark imperceptibility, tamper detection capability and practical considerations.

## **5.1 Watermark imperceptibility**

Since electronic documents are not images we cannot assess the distortion caused by the watermarking process using common distortion measures such as the Peak Signal to Noise Ratio (PSNR) or the Mean Absolute Error (MAE), because of this, the distortion assessment was carried out using a Mean Opinion Score (MOS) evaluation.

The MOS evaluation was set this way: twenty pair of different documents (each pair consisted of the original and the watermarked document) were shown to 100 observers whose gender and ages are distributed as described in Tab. 1.


Table 1. Age and gender distribution of MOS observers.

The observers were asked to assess the difference between the original and watermarked documents, and to assign a score according to Tab. 2. And the average result of the MOS was a 4.6 which confirms the watermark imperceptibility. The observers argued the following reason to score other than 5:


Since the observers were aware that they must find differences, they pointed out what they though could be the difference, and even when these differences in fact existed, they were caused directly either by the printer or by the composition of the paper.


Table 2. MOS evaluation criteria.

Although there is not a standard benchmark for document watermarking systems, we will present results for common concerns in watermarking electronic documents such as watermark imperceptibility, tamper detection capability and practical considerations.

Since electronic documents are not images we cannot assess the distortion caused by the watermarking process using common distortion measures such as the Peak Signal to Noise Ratio (PSNR) or the Mean Absolute Error (MAE), because of this, the distortion assessment

The MOS evaluation was set this way: twenty pair of different documents (each pair consisted of the original and the watermarked document) were shown to 100 observers

> **Age (years) Female Male**  20-30 33 32 30-40 4 10 40-50 2 7 50+ 3 9

The observers were asked to assess the difference between the original and watermarked documents, and to assign a score according to Tab. 2. And the average result of the MOS was a 4.6 which confirms the watermark imperceptibility. The observers argued the

Since the observers were aware that they must find differences, they pointed out what they though could be the difference, and even when these differences in fact existed, they were

1 It is evident the difference between the two documents

5 There is not any perceptible difference 4 There is a slight difference that can be ignored 3 There is a slight difference which cannot be ignored

2 There is a noticeable difference

caused directly either by the printer or by the composition of the paper.

Score Meaning

Experimental results and discussions will be carried out in next section.

was carried out using a Mean Opinion Score (MOS) evaluation.

whose gender and ages are distributed as described in Tab. 1.

Table 1. Age and gender distribution of MOS observers.

following reason to score other than 5:

 The text is misaligned to the paper sheet. The paper whiteness is slight different.

The ink of the letters is uneven.

Table 2. MOS evaluation criteria.

**5. Results and discussions** 

**5.1 Watermark imperceptibility** 

To further support the results of the MOS, we present a measure of the distortion of the metrics compared with the original metrics (see Fig. 10). It can be seen that when a character with high ASCII value appears in the document, the distortion becomes larger although it is too small to cause significant distortion.

Fig. 10. Error percentage for each character in the ASCII code for some random watermark; the maximum distortion is about 16 %.

In Fig. 11 a pieces of a document and its watermarked version is shown.

### **5.2 Tamper detection capability**

Let's consider two possibilities to tamper a document, in the first one, the attacker changes characters according to convenience without changing the metrics because he expects that this won't damage the watermark, if the attack is carried out this way, we can expect a document as shown in Fig. 12. It is quite evident that some modifications were made, so any human can easily detect the tamper even if the original document is not available for comparison. Now, consider another variant, the attacker have knowledge of the file standard so he has the needed skills to modify the document to preserve its natural look, to achieve this goal, the attacker must to re-compute the metrics related to the tampered characters, as expected, the more tampered characters, the more the damage to the watermark, in Fig. 13 we show a typical behaviour of this phenomena, we can see that once the correlation value d is below the threshold value, it never surpasses it again, furthermore,

Authentication of Script Format Documents Using Watermarking Techniques 251

Fig. 13. System response as the percentage of tampered characters varies from 0% to 100%.

Fig. 14. System response as the percentage of tampered characters varies from 0% to 3.125%.

In Tab. 3 we present results for 10 different documents, showing the percentage of tampered characters that had to be tampered so the system considers them as tampered. High values in the table are explained as follows, as seen in Fig. 13 and Fig. 14, the correlation value does not decrease monotonically because the metrics are highly correlated to the watermark, this

(a)

(b)

Fig. 11. Sample documents. a) Original document. b) Watermarked document.

Fig. 12. Example of a malicious modification; only the characters were changed whilst the metrics remain unchanged. The modifications can be easily spotted.

even when the threshold seems to possess a parabolic like shape and in some point it decreases, the correlation value is below the threshold. A close up of Fig. 13 is shown in Fig. 14, in this figure we can see the point in which the correlation goes below the threshold, in this case, when about 0.6% of characters are tampered

(a)

(b)

Fig. 12. Example of a malicious modification; only the characters were changed whilst the

even when the threshold seems to possess a parabolic like shape and in some point it decreases, the correlation value is below the threshold. A close up of Fig. 13 is shown in Fig. 14, in this figure we can see the point in which the correlation goes below the threshold, in

metrics remain unchanged. The modifications can be easily spotted.

this case, when about 0.6% of characters are tampered

Fig. 11. Sample documents. a) Original document. b) Watermarked document.

Fig. 13. System response as the percentage of tampered characters varies from 0% to 100%.

Fig. 14. System response as the percentage of tampered characters varies from 0% to 3.125%.

In Tab. 3 we present results for 10 different documents, showing the percentage of tampered characters that had to be tampered so the system considers them as tampered. High values in the table are explained as follows, as seen in Fig. 13 and Fig. 14, the correlation value does not decrease monotonically because the metrics are highly correlated to the watermark, this

Authentication of Script Format Documents Using Watermarking Techniques 253

Through the development of this work, the following conclusions can be reached: Watermarking DDS format documents is a feasible and low complexity task that accomplishes a reliable electronic document authentication schemes with many desirable characteristics such as imperceptibility and very good tamper detection capabilities. Recall that many works in the field of document authentication are developed considering electronic documents as binary images, thus the development of watermarking systems in

Results show that watermark imperceptibility is highly achieved as described in section 5.1, and considering the results of the MOS test, we can conclude that the proposed watermarking system will meet almost any imperceptibility requirements. Another important achievement is the tamper detection capability, that proved to be reliable even in the worst case of our tests, however, if this is a concern, a future work could perform verifications in smaller blocks, for example, the verification can be done in streams of 100 characters, so the 22.7% of characters that must be tampered, and 23 characters altered out of 100 is more likely to be a harmless modification since would be more difficult to have an

Finally, the scheme discussed in this chapter is not intended to replace any security measures implemented in the different electronic document schemes such as the ones implemented in the ODF or in the PDF, but it would be advised to complement the current

The authors would like to thank the Council of Science and Technology (CONACYT) in Mexico and to the National Polytechnic Institute (IPN) of Mexico for support this work.

Examples in this chapter were chosen to mention C language in memory of its creator Dennis Ritchie, who passed away last October 12th, 2011. C language was extensively used

Adobe, (1999). PostScript Language Reference, Third edition. Addison-Wesley Publishing

Adobe, (2006). PDF Reference: Adobe Portable Document Format Version 1.7, Sixth Edition.

Gonzalez-Lee, M.; Santiago-Avila, C.; Nakano-Miyatake, M. & Perez- Meana, H.; (2009)

Gou, H. & Wu, M. (2007) Improving Embedding Payload in Binary Images with Super-

Watermarking based Document Authentication in Script Format. *Proc. 52th IEEE Midwest Symp. on Circuits and Systems*, ISBN 978-1-4244-4479-3. Cancun, Mexico.

Pixels. Proc. *IEEE Int. Conf. Image Processing*, ISBN 1-4244-1437-7. San Antonio,

ones so a more secure electronic document model could be achieved.

Company Inc., ISBN 0-201-37922-8, U.S.A.

Adobe Press, ISBN 0-321-30474-8, U.S.A

**6. Conclusions** 

script format is a rich research field.

attack useful to the proposes of any attacker.

during the development of this research.

August, 2009.

U.S.A , September, 2007.

**7. Acknowledgments** 

**8. References** 


causes oscillations specially in low percentages of tampering, so the reported percentages are those in which the correlation don't crosses the threshold anymore.

Table 3. Percentage of minimum altered characters the system can determine that the document is tampered.

#### **5.3 Practical considerations**

The system described above has a very low complexity, for embedding a watermark of length N, 5N multiplications are needed, the average execution time in a consumer laptop is depicted in Fig. 15. It can be seen that the system clearly meets a wide spectrum of practical needs; one can ensure that the system can process a document with hundreds of pages in few seconds, which should be good enough for most practical scenarios.

Fig. 15. Execution times for documents as the number of characters varies.
