**1. Introduction**

236 Emerging Informatics – Innovative Concepts and Applications

[7] Elizabeth Varki, Arif Merchant, Jianzhang Xu, and Xiaozhou Qiu. Issues and Challenges

[9] Dai Kobayashi, Akitsugu Watanabe, Toshihiro Uehara, Haruo Yokota, A high-

Articles, Systems and Computers in Japan, Vol 37, Issue 10, Pp 35-46 (2006) [10] Daniel P. Bovet, Marco Cesati : Understanding the Linux Kernel , O'reilly Press, pp.422-

[11] Maurice J. Bach : The Design of The UNIX Operating system, Bell Labs Press, pp.264-

[12] Chuck Silvers, UBC : An efficient Unified I/O and Memory Caching Node for Netbsd,

[13] White Paper, Using Real-Time I/O Signature Analysis to Identify Performance

[15] Mohamed Zahran, Kursad Albayraktaroglu and Manoj Franklin, Non-Inclusion Property in Multi-level Caches Revisited, IJCA, Vol. 14, No. 2, pp.1-10, (2007) [16] Hironao Takahashi, Hafiz Farooq Ahmad, Kinji Mori, Layered Memory Architecture for

[19] S. Przybylski,M. Horowitz, J. Hennessy, Characteristics of performance-optimal multi-

[20] Hironao Takahashi, Hafiz Farooq Ahmad, Kinji Mori, "Balanced Memory Architecture

[14] Wade Tuma, Comparisons of Drive Technologies for High-Transactions Databases,

http://www.vanemery.com/Linux/Ramdisk/ramdisk.html

(ISADS 2009), Athens, Greece, March 23-25, 2009, pp 93-99

Computer Architecture pp. 114 - 121 (1989).

Distributed Systems, 15(6):559–574, 2004. [8] ADS protocol specification R3.0 , MSTC/JOP 1101-19999/09/3

Diego, California, USA, June 18–23, 2000

Improvement Options for Database Applications,

498 (2001).

287 (1986)

Inc, July 2006.

Systems, Inc. (August 2007 )

No. 2, 2000, p. 37-66. [18] RAMDISK browsed at sep 20th 2009

in the Performance Analysis of Real Disk Arrays. IEEE Transactions on Parallel and

availability software update method for distributed storage systems: Research

Proceedings of FREENIX Track: 2000 USENIX Annual Technical Conference, San

http://www.soliddata.com/pdf/WP\_IOSignatures\_v2.pdf, Solid Data Systems

http://www.soliddata.com/pdf/WP\_Drive\_Comparison\_v2.pdf, Solid Data

High IO Intensive Information Services to Achieve Timeliness, 11th IEEE High Assurance Systems Engineering Symposium Nanjing, China, December 3 - 5, 2008 [17] S. Pai, Vivek ; Druschel, Peter ; Zwaenepoel, Willy, "IO-Lite: a unified I/O buffering

and caching system", ACM Transactions on Computer Systems (TOCS), Vol. 18,

level cache hierarchies, Proc of the 16th Annual International Symposium on

for High I/O Intensive Information Services for Autonomous Decentralized System", The 9th International Symposium on Autonomous Decentralized Systems The electronic document authentication is a subject of active research because, with the release of very efficient program for documents, images and video processing, the manipulation of such digital content becomes easier. Then, the development of efficient methods allowing the protection of sensitive digital material, avoiding unauthorized manipulations, without degradation of the original materials is a very important task that has found application in the solution of many practical problems in the financial, banking, insurances, legal, and Government fields, among others.

Thus digital content authentication and protection algorithms, for using in several practical applications, have been proposed during the last decade some of them use fragile or semifragile watermarking algorithm, fingerprints for document leakage investigations and robust watermark for copyright protection.

Most of these schemes consider the document to be protected as an image, without taking in account that in a more natural scenario, a digital document is in fact stored using an electronic format such as PDF, postscript and word files, etc., especially with the increasing use of digital signatures.

This chapter presents an authentication scheme for script format digital documents using watermarking techniques that are capable to achieve an accurate verification that makes possible to detect malicious and unauthorized documents manipulations. The remaining of this chapter is organized as follows, first, a review of similar works for document watermarking, followed by detailed background in sections 2 and 3, then, the document watermarking approach is presented in section 4, the results are presented in section 5 and finally some conclusions where the main achievements of this watermarking approach will be discussed, and in the end, the references used in this chapter are listed.

### **1.1 Previous works**

Several schemes have been developed to authenticate digital documents which embed invisible watermark into digital documents, most of them considering the digital documents as binary images. Yang and Kot proposed a document authentication scheme, in which an authentication code is embedded by changing the spaces size between consecutive words

Authentication of Script Format Documents Using Watermarking Techniques 239

Finally, we would like to discuss the previous work in document security done by the main promoters of electronic document schemes, the PDF uses a scheme with several variants of permissions that allow user to do different tasks, for example, permissions for printing or even copy portions of the document (done by CTL+C, CTL+V shortcuts), a password protected document will ask for the password when one wants to perform one of the described task. Unfortunately, this scheme is tied to Acrobat Reader and the security can be override as easy as to use another PDF viewer, for example Gnome Document Viewer available in most Linux distributions, that viewer won't ask for any password for printing or to copy portions of the document. Another possibility is that the security relies on hiding the document contents; in this case, the viewer doesn't allow anyone to see the contents of the document unless the right password is given. Again this scheme can be easily broken with the use of free tools, for example PDFcrack (Noren, 2008); by using this tools, anyone can break the password within a couple of days with a consumer computer. Once Broken, the attacker will be able to view the document contents, and save an unprotected copy of the document which can be modified, and even saved with the same password so the legitimate document is replaced by the tampered document and the user is unaware of this. More on

Computer languages such as C language are general propose, they can be used for developing a broad spectrum of applications; others like Fortan and Matlab are designed for numerical calculations so their respective instruction sets facilitate greatly calculations in engineering field. One can easily think on many useful instructions or functions that facilitate coding complex programs, for example, the function sin(x) is very useful in engineering computing programs but it is of little use in describing an electronic document. In order to achieve an efficient description of the basic elements that allow the creation of a practical document, we need a proper computer language that meets the challenge of describing properly an electronic document, this computer language is called a Document Description Language or DDL for short, and thus a DDL is a computer language which instruction set is designed to contain commands for common tasks needed to draw a

A DDL is designed to facilitate the description of a document, in other words, their instruction set are very handy for common task such as to indicate where to draw a given set of characters (e.g. a row or a paragraph), which font size, and other properties according to the desired document layout. It is hard to imagine trying to describe a web page using C

We can mention many implementations of practical DDL's, for example, for describing Web pages we can use the Hiper Text Markup Language (HTML), and for electronic documentation, we can choose among PostSript, Portable Document Format (PDF), Open

As discussed above, there are many DDL's, most of them are different radically, this difficult the development of a universal approach that can be used for every DDL. In most cases, a given watermarking approach can be adapted for several DDL's, but in other cases, we must

or Matlab instruction set, so, the scope and propose of DLL's is evident.

Document Format (ODF) used by the OppenOffice.org and LibreOffice projects.

the security model of PDF can be read in (Adobe, 2006).

**2. Document description languages** 

to design a completely different paradigm.

document.

and characters (Yang & Kot, 2004). The main drawback of this scheme is its high computational complexity and vulnerability against noise.

Huang proposed an authentication method for binary images including text documents (Huang et al., 2004), in which firstly the binary image is segmented in blocks and then some pixels in each block are rearranged in order to enforce a given relationship between the total number of black and white pixels in it. During the authentication process, this relationship is verified for each block in order to authenticate the block. If this relationship is satisfied the block is considered as authentic, otherwise the block is considered as tampered. The principal disadvantage of this method is that a degradation introduced in the encoded binary image is noticeable.

Wu and Liu proposed binary image block-wise authentication scheme, in which flippable pixels in each block are manipulated in order to embed a watermark bit in the block (Wu & Liu, 2004). Here the embedded watermark is imperceptible, because fliping flippable pixels do not cause any distortion of the binary image. However, in general, the watermark embedding payload is very low compared with the number of flippable pixels into the image.

To improve the embedding payload, Gou and Wu introduced the concept of "super-pixels" and wet paper coding into the Wu and Liu's scheme (Gou & Wu,, 2007). The "Super-pixels" form a set of individually non-flippable pixels, which can be removed or added together without causing visual distortion. Also Wu and Liu reported that their authentication scheme is robust to printing and scanning operations. However during the scanning process, a rotation, even with angles smaller than one degree may results in an embedded watermark signal lost.

Document authentication schemes for formats such as Portable Document Format (PDF) or PostScript had received few attention among researchers although many official documents are stored using this type of formats. In (Zhu et al., 2007), a document authentication method using render sequence encoding is proposed, in which the encoding process is based on modulate the display sequences using a Document Description Language (DDL), such as PostScript, PDF, Printer Control Language, etc. In the render sequence, predefined characters are permuted by a user's secret key; and then during the authentication process, the document is considered as authentic if the permutation corresponds to the secret key used in embedded stage. This scheme determines correctly if a document is authentic or not, however there are two inconveniences that may limit its practical use. Firstly the size of the encoded document file is considerably increased compared with the original file size, and the second one is the fact that the structure of the encoded render sequence is unnatural, and as a consequence, it can be easily detected by an unauthorized person, doing it possible the used of reverse engineering to tamper the document.

To solve these problems, Gonzalez-Lee proposed a watermarking-based document authentication scheme, in which character metrics are used to embed a watermark sequence (Gonzalez-Lee et al., 2009). The advantage of proposed scheme is that the watermarked file size is not changed compared with original file size and also the watermarked file conserves its original appearance, enhances in this form its security because the watermark presence is not evident.

Finally, we would like to discuss the previous work in document security done by the main promoters of electronic document schemes, the PDF uses a scheme with several variants of permissions that allow user to do different tasks, for example, permissions for printing or even copy portions of the document (done by CTL+C, CTL+V shortcuts), a password protected document will ask for the password when one wants to perform one of the described task. Unfortunately, this scheme is tied to Acrobat Reader and the security can be override as easy as to use another PDF viewer, for example Gnome Document Viewer available in most Linux distributions, that viewer won't ask for any password for printing or to copy portions of the document. Another possibility is that the security relies on hiding the document contents; in this case, the viewer doesn't allow anyone to see the contents of the document unless the right password is given. Again this scheme can be easily broken with the use of free tools, for example PDFcrack (Noren, 2008); by using this tools, anyone can break the password within a couple of days with a consumer computer. Once Broken, the attacker will be able to view the document contents, and save an unprotected copy of the document which can be modified, and even saved with the same password so the legitimate document is replaced by the tampered document and the user is unaware of this. More on the security model of PDF can be read in (Adobe, 2006).
