**3.1 Character metrics**

242 Emerging Informatics – Innovative Concepts and Applications

%% %% Page Contents

%% %% Page Contents

showpage which is used to mark the end of a page and tell the document interpreter that the page must be drawn). In this example, the actual contents of the page is not shown, a comment is shown instead. The first lines illustrate a header, then, the marker %%Page: x x is used to begin the page x, and the command showpage marks the end of the page.

In the examples ahead, all this structure will be omitted and just the contents will be illustrated in order to keep the examples small and to focus in the parts of the script that are processed.

%!PS-Adobe-2.0

%%Creator: Txt2Ps

%%PageOrder: Ascend

%%PaperSize: Letter

%%Title: A Simple Document.

%%BoundingBox: 0 0 615 792

/Times-Roman findfont

Fig. 2. Example of a basic DDS of PostScript.

12 scalefont setfont

%%CreationDate: Fri Jul 9 17:31:33 2010

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%Pages: 2

%%BeginSetup

%%EndSetup

%%Page: 1 1

showpage

%%Page: N N

*. . .* 

showpage

In last section, the basic concepts of DDS's and their role was described, in this section we will go deeper in the internals of the document description scripts.

Let's first introduce the character metrics concept.

A character metric is the distance between consecutive characters, another way to understand the character metrics is as the distance that "the cursor" must be advanced to place next character. A character has two metrics, called mx and my, that are the distance in the x-axis and the y-axis where the next character must be placed (see Fig. 3). Since some languages have different writing styles, the metrics should agree with this, and thus we can have vertical documents, like Japanese in which mx=0 and my ≠0, and horizontal documents like in English in which mx ≠0 and my=0, and the seldom used, diagonal documents, which are mostly used in graphic design field, even when seems that this class apply only for line shapes, here consider that any text in which mx ≠0 and my ≠0 holds is a diagonal document. Fig. 4 shows examples of each type of documents.

Fig. 3. The character metrics.


Fig. 4. Types of documents. a) Horizontal document, b) Vertical document and c) diagonal document.

More information on character metrics can be read in (Turner, 2000).

As mentioned above, the actual contents of a page is enclosed in special tags; for text documents, the text is organized in rows. In Fig. 5 it is shown an example of a simple row definition. Firstly, the position for the row within the page is set at (52,742) by the command

Authentication of Script Format Documents Using Watermarking Techniques 245

Preserve as much as possible of the watermark, so an automatic verification system still

From this situation is evident the need of a document authentication system based on fragile watermarking, so even if the modification of the document is small, the watermark shall be

In section 3.1, the metrics of characters were described, in this section; we discuss a model for watermarking using characters metrics. This model is depicted in Fig. 6. In this model, some edition software takes the raw text so it can build a well formed DDS from the input data; the edition software uses the instructions in a DDL data base so the resulting DDS follows the file standard. Then, the watermarking algorithm embeds a watermark generated using some secret key in the resulting script, the final product is a watermarked DDS.

There are many software capable of producing high quality documents, we will assume that such software is provided by third party, yet the resulting documents follow some standard. So, the watermarking system has to be designed to interpret the input DDS in order to

Next, we will introduce a watermarking scheme which relies on the modification of character metrics for watermark embedding; a question might be arisen regarding the distortion caused by the metrics modification, in this subject, we must consider that a unit of

The watermark *W = w ,i = 1,2, ..,N <sup>i</sup>* . is a binary (-1 or 1) pseudo random sequence with zero mean an variance 1. Without losing generality, we will assume that we are dealing with horizontal documents; the extension to vertical and diagonal documents is easily carried out.

The whole document is interpreted and then we can form two vectors named *C = c ,i = 1,2, ..,N <sup>i</sup>* . and *M = m ,i = 1,2, ..,N <sup>i</sup>* . , the former is the vector of the characters of the document, and the latter is a vector of their metrics. The character metrics are firstly

Fig. 6. Watermarking model for electronic documents in a DDS approach.

metrics equals 1/72 inches, so small modifications should be negligible.

be able to detect it an thus to validate the document as a legitimate one.

no detectable.

**4.2 Watermarking using character metrics** 

process it under this assumption.

modified as follows:


Fig. 5. Example of an actual row definition.

moveto and then the text "C Language History" is the contents of the row and the following vector contains the metrics for each character in the row, generally, the characters does not full fill the page width, so a small constant should be added to each metric in order to fit the page width, that is to say, to left and right justify the text, next, the command xshow indicates that this row must be drawn with given metrics, however nothing is actually drawn until a showpage command is encountered.

As depicted in Fig. 5, we can find a rich source of data that can be modified in order to either hide information to implement a steganographic system or to embed digital watermarks. A natural question is that if such modifications could have side effects such as visual distortion, but consider that each unit of metrics is in fact 1/72 inches, that it to say, a metric of 1.0 = 1/72 inches, so the changes are mostly imperceptible. More about DDS languages can be read on (Adobe, 1999),(Adobe,2006) and (Reid, 1990).

In next section, we will discuss a watermarking system that uses character metrics in order to embed digital watermarks.
