**Meet the editor**

Dr. Minoru Mori received the B.E. and Ph.D. degrees in electrical engineering from Tokyo Institute of Technology in 1993 and 2008, respectively. In 1993, he joined Nippon Telegraph and Telephone (NTT). He was engaged in developing character recognition and Image processing systems. He was an assistant manager in Broadband Business Development Division of NTT-East Corp. from

2003 to 2006 and a manager in NTT Advanced Technology Corp. from 2006 to 2007. Since 2007, he has been an senior research scientist in NTT Communication Science Laboratories. He was also an assistant professor of IREIIMS of Tokyo Women's Medical University from 2008 to 2010. His interests include document analysis, pattern recognition, and image processing. He is a senior member of IEICE.

Contents

**Preface VII** 

Seiichi Uchida

Chapter 1 **Statistical Deformation Model** 

Bartłomiej Starosta

Chapter 2 **Character Recognition with Metasets 15** 

Chapter 3 **Recognition of Tifinaghe Characters Using** 

Chapter 4 **Character Degradation Model and HMM Word** 

**Case of the Arabic Cursive Writing 73** 

Aria Pezeshk and Richard L. Tutwiler

Chapter 6 **Application of Gaussian-Hermite Moments in License 85** 

**for Handwritten Character Recognition 1** 

**Dynamic Programming & Neural Network 35**  Rachid El Ayachi, Mohamed Fakir and Belaid Bouikhalene

Chapter 5 **Grid'5000 Based Large Scale OCR Using the DTW Algorithm:** 

Lin Wang, Xinggu Pan, ZiZhong Niu and Xiaojuan Ma

**Recognition System for Text Extracted from Maps 53** 

Mohamed Labidi, Maher Khemakhem and Mohamed Jemni

## Contents

#### **Preface** XI



Preface

In the field of document recognition and understanding, whereas scanned paper documents were previously the only recognition target, various new media such as camera-captured documents, videos, and natural scene images have recently started to attract attention because of the growth of the Internet/WWW and the rapid adoption of low-priced digital cameras/videos. The keys to the breakthrough include character detection from complex backgrounds, discrimination of characters from noncharacters, modern or ancient unique font recognition, fast retrieval technique from large-scaled scanned documents, multi-lingual OCR, and unconstrained handwriting recognition. This book aims to present recent advances, applications, and new ideas that are relevant to document recognition and understanding, from technical topics such as image processing, feature extraction or classification, to new applications like camera-based recognition or character-based natural scene analysis. The goal of this book is to provide a new trend and a reference source for academic research and for

professionals working in the document recognition and understanding field.

**Minoru Mori**

Japan

NTT Communication Science Laboratories, NTT Corp.,

## Preface

In the field of document recognition and understanding, whereas scanned paper documents were previously the only recognition target, various new media such as camera-captured documents, videos, and natural scene images have recently started to attract attention because of the growth of the Internet/WWW and the rapid adoption of low-priced digital cameras/videos. The keys to the breakthrough include character detection from complex backgrounds, discrimination of characters from noncharacters, modern or ancient unique font recognition, fast retrieval technique from large-scaled scanned documents, multi-lingual OCR, and unconstrained handwriting recognition. This book aims to present recent advances, applications, and new ideas that are relevant to document recognition and understanding, from technical topics such as image processing, feature extraction or classification, to new applications like camera-based recognition or character-based natural scene analysis. The goal of this book is to provide a new trend and a reference source for academic research and for professionals working in the document recognition and understanding field.

> **Minoru Mori** NTT Communication Science Laboratories, NTT Corp., Japan

**0**

**1**

Seiichi Uchida *Kyushu University*

*Japan*

**Statistical Deformation Model for Handwritten**

One of the main problems of offline and online handwritten character recognition is how to deal with the deformations in characters. A promising strategy to this problem is the incorporation of a deformation model. If recognition can be done with a reasonable deformation model, it may become tolerant to deformations within each character category. There have been proposed many deformation models and some of them were designed in an empirical manner. Recognition methods based on elastic matching have often relied on a continuous and monotonic deformation model (Bahlmann & Burkhardt, 2004; Burr, 1983; Connell & Jain, 2001; Fujimoto et al., 1976; Yoshida & Sakoe, 1982). This is a typical empirical model and has been developed according to the observation that character patterns often preserve their topologies. Affine deformation models (Wakahara, 1994; Wakahara & Odaka, 1997; Wakahara et al., 2001) and local perturbation models (or image distortion models (Keysers et al., 2004)) are also popular empirical deformation models. While the empirical models generally work well in handwritten character recognition tasks, they are not well-grounded by actual deformations of handwritten characters. In addition, the empirical models are just approximations of actual deformations and they cannot incorporate category-dependent deformation characteristics. In fact, the category-dependent deformation characteristics exist. For example, in category "M", two parallel vertical strokes are often slanted to be closer. In contrast, in category "H", however, the same deformation is rarely

Statistical models are better alternatives to the empirical models. The statistical models learn deformation characteristics from actual character patterns. Thus, if a model learns the deformations of a certain category, it can represent the category-dependent deformation

Hidden Markov model (HMM) is a popular statistical model for handwritten characters (e.g., (Cho et al., 1995; Hu et al., 1996; Kuo & Agazzi, 1994; Nag et al., 1986; Nakai et al., 2001; Park & Lee, 1998)). HMM has not only a solid stochastic background and but also a well-established learning scheme. HMM, however, has a limitation on regulating global deformation characteristics; that is, HMM can regulate local deformations of neighboring

This chapter is concerned with another statistical deformation model of offline and online handwritten characters. This deformation model is based on a combination of elastic matching and principal component analysis (PCA) and also capable of learning actual deformations of

**1. Introduction**

observed.

characteristics.

regions due to its Markovian property.

**Character Recognition**

## **Statistical Deformation Model for Handwritten Character Recognition**

Seiichi Uchida *Kyushu University Japan*

#### **1. Introduction**

One of the main problems of offline and online handwritten character recognition is how to deal with the deformations in characters. A promising strategy to this problem is the incorporation of a deformation model. If recognition can be done with a reasonable deformation model, it may become tolerant to deformations within each character category.

There have been proposed many deformation models and some of them were designed in an empirical manner. Recognition methods based on elastic matching have often relied on a continuous and monotonic deformation model (Bahlmann & Burkhardt, 2004; Burr, 1983; Connell & Jain, 2001; Fujimoto et al., 1976; Yoshida & Sakoe, 1982). This is a typical empirical model and has been developed according to the observation that character patterns often preserve their topologies. Affine deformation models (Wakahara, 1994; Wakahara & Odaka, 1997; Wakahara et al., 2001) and local perturbation models (or image distortion models (Keysers et al., 2004)) are also popular empirical deformation models.

While the empirical models generally work well in handwritten character recognition tasks, they are not well-grounded by actual deformations of handwritten characters. In addition, the empirical models are just approximations of actual deformations and they cannot incorporate category-dependent deformation characteristics. In fact, the category-dependent deformation characteristics exist. For example, in category "M", two parallel vertical strokes are often slanted to be closer. In contrast, in category "H", however, the same deformation is rarely observed.

Statistical models are better alternatives to the empirical models. The statistical models learn deformation characteristics from actual character patterns. Thus, if a model learns the deformations of a certain category, it can represent the category-dependent deformation characteristics.

Hidden Markov model (HMM) is a popular statistical model for handwritten characters (e.g., (Cho et al., 1995; Hu et al., 1996; Kuo & Agazzi, 1994; Nag et al., 1986; Nakai et al., 2001; Park & Lee, 1998)). HMM has not only a solid stochastic background and but also a well-established learning scheme. HMM, however, has a limitation on regulating global deformation characteristics; that is, HMM can regulate local deformations of neighboring regions due to its Markovian property.

This chapter is concerned with another statistical deformation model of offline and online handwritten characters. This deformation model is based on a combination of elastic matching and principal component analysis (PCA) and also capable of learning actual deformations of

*x*

Fig. 2. Eigen-deformations of handwritten characters.

are often assumed to regularize F .

Note that v is a discrete representation of F˜.

problem are summarized in Uchida & Sakoe (2005).

**2.2 Estimations of eigen-deformations**

*vector*,

pixel-to-pixel correspondence from R to E. Elastic matching between R and E is formulated

Statistical Deformation Model for Handwritten Character Recognition 3

where E*<sup>F</sup>* is the character image obtained by fitting E to R, i.e., E*<sup>F</sup>* = {e*xi*,*j*,*yi*,*<sup>j</sup>*}, and (*xi*,*j*, *yi*,*j*) denotes the pixel of E corresponding to the (*i*, *j*)th pixel of R under F . On the minimization, several constraints (such as a smoothness constraint and boundary constraints)

Let F˜ denote the mapping F which minimizes *JR*,*E*(F ) of (1). This mapping F˜ represents the relative deformation of the input image E from the reference image R. Specifically, the deformation of E is extracted as the following 2*I*2-dimensional vector, called *deformation*

The constrained minimization of (1) with respect to F (i.e., the extraction of v) is done by various optimization strategies. If the mapping F is defined as a parametric function, iterative strategies and exhaustive strategies are often employed for optimizing the parameters of F . In contrast, if the mapping F is a non-parametric function, combinatorial optimization strategies, such as dynamic programming, local perturbation, and deterministic relaxation, are employed. Various formulations and optimization strategies of the elastic matching

Eigen-deformations of a category are intrinsic deformations of the category and defined as *M* principal axes {u1,..., u*m*,..., u*M*} which span an *M*-dimensional subspace in the 2*I*2-dimensional deformation space. The eigen-deformations can be estimated by applying

<sup>v</sup> = ((<sup>1</sup> <sup>−</sup> *<sup>x</sup>*1,1, 1 <sup>−</sup> *<sup>y</sup>*1,1),...,(*<sup>i</sup>* <sup>−</sup> *xi*,*j*, *<sup>j</sup>* <sup>−</sup> *yi*,*j*),...,(*<sup>I</sup>* <sup>−</sup> *xI*,*I*, *<sup>I</sup>* <sup>−</sup> *yI*,*I*))*T*. (2)

*JR*,*E*(F ) = �R − E*<sup>F</sup>* �, (1)

as the minimization problem of the following objective function with respect to F :

*y*

Fig. 1. Elastic matching between two character images.

handwritten characters. Different from HMM, this deformation model can regulate not only local deformations but also global deformations. In the following, the contributions of this chapter are summarized.

#### **1.1 Contributions of this chapter**

The first contribution of this chapter is to introduce a statistical deformation model for *offline* handwritten character recognition. The model is realized by two steps. The first step is the automatic extraction of the deformations of character images by elastic matching. Elastic matching is formulated as an optimization problem of the pixel-to-pixel correspondence between two image patterns. Since the resulting pixel-to-pixel correspondence represents the displacement of individual pixels, i.e., the deformation of one character image from another. The second step is statistical analysis of the extracted deformations by PCA. The resulting principal components, called *eigen-deformations*, represent intrinsic deformations of handwritten characters.

The second contribution is to introduce a statistical deformation model for *online* handwritten character recognition. While the discussion is similar to the above offline case, it is different in several points. For example, deformations often appear as the difference in pattern length. Consequently, online handwritten character patterns have rarely been handled in a PCA-based statistical analysis framework, which assumes the same dimensionality of subjected patterns. In addition, online handwritten character patterns often undergo heavy nonlinear temporal/spatial fluctuation. Elastic matching to extract the relative deformation between two patterns solves these problems and helps to establish a statistical deformation model.

#### **2. Statistical deformation model of offline handwritten character recognition**

#### **2.1 Extraction of deformations by elastic matching**

The first step for statistical deformation analysis of handwritten character images is the extraction of deformations of actual handwritten character images and it can be done automatically by elastic matching. Elastic matching is formulated as the following optimization problem. Consider an *I* × *I* reference character image R = {r*i*,*j*} and an *I* × *I* input character image E = {e*x*,*y*}, where r*i*,*<sup>j</sup>* and e*x*,*<sup>y</sup>* are *d*-dimensional pixel feature vectors at pixel (*i*, *j*) on R and (*x*, *y*) on E, respectively. Let F denote a 2D-2D mapping from R to E, i.e., F : (*i*, *j*) �→ (*x*, *y*). As shown in Figure 1, the mapping F determines the 2 Will-be-set-by-IN-TECH

*R*={ *ri,j* } *E*={ *ex,y* }

...

...

handwritten characters. Different from HMM, this deformation model can regulate not only local deformations but also global deformations. In the following, the contributions of this

The first contribution of this chapter is to introduce a statistical deformation model for *offline* handwritten character recognition. The model is realized by two steps. The first step is the automatic extraction of the deformations of character images by elastic matching. Elastic matching is formulated as an optimization problem of the pixel-to-pixel correspondence between two image patterns. Since the resulting pixel-to-pixel correspondence represents the displacement of individual pixels, i.e., the deformation of one character image from another. The second step is statistical analysis of the extracted deformations by PCA. The resulting principal components, called *eigen-deformations*, represent intrinsic deformations of

The second contribution is to introduce a statistical deformation model for *online* handwritten character recognition. While the discussion is similar to the above offline case, it is different in several points. For example, deformations often appear as the difference in pattern length. Consequently, online handwritten character patterns have rarely been handled in a PCA-based statistical analysis framework, which assumes the same dimensionality of subjected patterns. In addition, online handwritten character patterns often undergo heavy nonlinear temporal/spatial fluctuation. Elastic matching to extract the relative deformation between two patterns solves these problems and helps to establish a statistical deformation

**2. Statistical deformation model of offline handwritten character recognition**

The first step for statistical deformation analysis of handwritten character images is the extraction of deformations of actual handwritten character images and it can be done automatically by elastic matching. Elastic matching is formulated as the following optimization problem. Consider an *I* × *I* reference character image R = {r*i*,*j*} and an *I* × *I* input character image E = {e*x*,*y*}, where r*i*,*<sup>j</sup>* and e*x*,*<sup>y</sup>* are *d*-dimensional pixel feature vectors at pixel (*i*, *j*) on R and (*x*, *y*) on E, respectively. Let F denote a 2D-2D mapping from R to E, i.e., F : (*i*, *j*) �→ (*x*, *y*). As shown in Figure 1, the mapping F determines the

2D-2D mapping *F* (2D warping)

*i*

(*i, j*)

*j*

chapter are summarized.

handwritten characters.

model.

**1.1 Contributions of this chapter**

Fig. 1. Elastic matching between two character images.

**2.1 Extraction of deformations by elastic matching**

*y*

*x*

(*x*, *y*)

Fig. 2. Eigen-deformations of handwritten characters.

pixel-to-pixel correspondence from R to E. Elastic matching between R and E is formulated as the minimization problem of the following objective function with respect to F :

$$J\_{\mathbf{R},\mathbf{E}}(F) = \|\mathbf{R} - E\_F\|\_{\prime} \tag{1}$$

where E*<sup>F</sup>* is the character image obtained by fitting E to R, i.e., E*<sup>F</sup>* = {e*xi*,*j*,*yi*,*<sup>j</sup>*}, and (*xi*,*j*, *yi*,*j*) denotes the pixel of E corresponding to the (*i*, *j*)th pixel of R under F . On the minimization, several constraints (such as a smoothness constraint and boundary constraints) are often assumed to regularize F .

Let F˜ denote the mapping F which minimizes *JR*,*E*(F ) of (1). This mapping F˜ represents the relative deformation of the input image E from the reference image R. Specifically, the deformation of E is extracted as the following 2*I*2-dimensional vector, called *deformation vector*,

$$w = \left( (1 - \mathbf{x}\_{1,1}, 1 - y\_{1,1}), \dots, (\mathbf{i} - \mathbf{x}\_{i,j}, \mathbf{j} - y\_{i,j}), \dots, (\mathbf{I} - \mathbf{x}\_{1,\mathbf{I}}, \mathbf{I} - y\_{1,\mathbf{I}}) \right)^T. \tag{2}$$

Note that v is a discrete representation of F˜.

The constrained minimization of (1) with respect to F (i.e., the extraction of v) is done by various optimization strategies. If the mapping F is defined as a parametric function, iterative strategies and exhaustive strategies are often employed for optimizing the parameters of F . In contrast, if the mapping F is a non-parametric function, combinatorial optimization strategies, such as dynamic programming, local perturbation, and deterministic relaxation, are employed. Various formulations and optimization strategies of the elastic matching problem are summarized in Uchida & Sakoe (2005).

#### **2.2 Estimations of eigen-deformations**

Eigen-deformations of a category are intrinsic deformations of the category and defined as *M* principal axes {u1,..., u*m*,..., u*M*} which span an *M*-dimensional subspace in the 2*I*2-dimensional deformation space. The eigen-deformations can be estimated by applying

stroke and the third was the width variation of the upper part. Consequently, this figure

Statistical Deformation Model for Handwritten Character Recognition 5

Note that in this experiment, the dimensionality of the deformation vector v was 74 though the size of the character image pattern was 20 <sup>×</sup> 20 (i.e., *<sup>I</sup>* <sup>=</sup> 20 and 2*I*<sup>2</sup> <sup>=</sup> 800). This is because a "sparse" EM was used where the displacements of 3 pixels (leftmost, middle, and rightmost) were optimized at every row. The displacements of the other pixels were given by

Figure 3 shows the patterns R deformed by the first three eigen-deformations u1, u2, and

the *m*th eigenvector. This figure also show that frequent deformations were extracted as the

Figure 4 shows the cumulative proportion of each category. The cumulative proportion by

cumulative proportion exceeded 50% with the top 3 ∼ 5 eigen-deformations and 80% with the top 10 ∼ 20 eigen-deformations. Thus, the distribution of deformation vectors was not isotropic and can be approximated by a small number of eigen-deformations. In other words,

The eigen-deformations can be utilized for recognizing handwritten character images. A direct use of the eigen-deformations for evaluating a distance between two characters R and

where E is an unknown input image and v is the deformation extracted by the elastic matching between R and E. This is the well-known Mahalanobis distance and evaluates the statistical divergence of the estimated deformation on E from the deformations which usually appear in the category of R. If the estimated deformation v gives a large distance value, the result of elastic matching between E and R is somewhat abnormal and therefore

The recognition performance by *D*disp(R, E) alone, however, is not satisfactory. This is because the distance *D*disp(R, E) completely neglects the distance of pixel features. This fact

An alternative and reasonable choice is the linear combination of the distance in the pixel feature space and the distance in the deformation space (Uchida & Sakoe, 2003b), that is,

In practice, the modified Mahalanobis distance (Kimura et al., 1987) is employed instead of (3). Specifically, the higher-order eigenvalues *λ<sup>m</sup>* (*m* = *M* + 2, . . . , 2*I*2) are replaced by

<sup>√</sup>*λ<sup>m</sup>* (*<sup>k</sup>* <sup>=</sup> <sup>−</sup>2, <sup>−</sup>1, 0, 1, 2), where *<sup>λ</sup><sup>m</sup>* is the eigenvalue of

*<sup>m</sup>*=<sup>1</sup> *λm*. In all categories, the

<sup>2</sup> , (3)

*<sup>m</sup>*=<sup>1</sup> *<sup>λ</sup>m*/ <sup>∑</sup><sup>74</sup>

2*I*<sup>2</sup> ∑ *m*=1

*D*hybrid(R, E)=(1 − *w*)*D*feat(R, E) + *wD*disp(R, E), (4)

*D*feat(R, E) = *JR*,*E*(F˜), (5)

1 *λm*

�v − v, u*m*�

confirms that frequent deformations of "A" were extracted successfully.

there existed a low-dimensional and efficient subspace of deformations.

*<sup>D</sup>*disp(R, <sup>E</sup>)=(<sup>v</sup> <sup>−</sup> <sup>v</sup>)*T*Σ−1(<sup>v</sup> <sup>−</sup> <sup>v</sup>) =

the category of R will not become a candidate of the correct category of E.

where *D*feat(R, E) is the elastic matching distance in the pixel feature space, i.e.,

will be certified through an experimental result in 2.5.

and *w* is a constant (0 ≤ *w* ≤ 1) to ballance two distances.

linear interpolation.

E is as follows:

u<sup>3</sup> with the amplification with *k*

eigen-deformation at each category.

the top *M* eigen-deformations is defined as *ρ*(*M*) = ∑*<sup>M</sup>*

**2.3 Recognition with eigen-deformations (1)**

Fig. 3. Reference pattern R deformed by top three eigen-deformations, u1, u2, and u3.

Fig. 4. Category-wise cumulative proportion *ρ*(*M*) of eigen-deformations at *M* = 1, 3, 5, 10, 20, and 30. Note that *ρ*(*M*) = 100% at *M* = 74.

PCA to {v*n*|*n* = 1, . . . , *N*}, where v*<sup>n</sup>* is the extracted deformation between R and E*n*. Specifically, the eigen-deformations are obtained as the eigen-vectors of the covariance matrix <sup>Σ</sup> <sup>=</sup> <sup>∑</sup>*n*(v*<sup>n</sup>* <sup>−</sup> <sup>v</sup>)(v*<sup>n</sup>* <sup>−</sup> <sup>v</sup>)*T*/*N*, where <sup>v</sup> is the mean vector of {v*n*}. Figure 2 shows the first three eigen-deformations estimated from 500 handwritten characters of the category "A". The first eigen-deformation u1, that is, the most frequent deformation of "A", was the global slant transformation. The second was the vertical shift of the horizontal 4 Will-be-set-by-IN-TECH


> top 10 top 20 top 30

top 1

top 3 top 5

diff

Fig. 3. Reference pattern R deformed by top three eigen-deformations, u1, u2, and u3.

A B C DE F G H I J K L M NO P Q RS T U VWXY Z

PCA to {v*n*|*n* = 1, . . . , *N*}, where v*<sup>n</sup>* is the extracted deformation between R and E*n*. Specifically, the eigen-deformations are obtained as the eigen-vectors of the covariance matrix

Figure 2 shows the first three eigen-deformations estimated from 500 handwritten characters of the category "A". The first eigen-deformation u1, that is, the most frequent deformation of "A", was the global slant transformation. The second was the vertical shift of the horizontal

Fig. 4. Category-wise cumulative proportion *ρ*(*M*) of eigen-deformations at

*M* = 1, 3, 5, 10, 20, and 30. Note that *ρ*(*M*) = 100% at *M* = 74.

<sup>Σ</sup> <sup>=</sup> <sup>∑</sup>*n*(v*<sup>n</sup>* <sup>−</sup> <sup>v</sup>)(v*<sup>n</sup>* <sup>−</sup> <sup>v</sup>)*T*/*N*, where <sup>v</sup> is the mean vector of {v*n*}.

*u*1 -2 0 +2


*u*3

cumulative proportion (%)

0

20

40

60

80

100

stroke and the third was the width variation of the upper part. Consequently, this figure confirms that frequent deformations of "A" were extracted successfully.

Note that in this experiment, the dimensionality of the deformation vector v was 74 though the size of the character image pattern was 20 <sup>×</sup> 20 (i.e., *<sup>I</sup>* <sup>=</sup> 20 and 2*I*<sup>2</sup> <sup>=</sup> 800). This is because a "sparse" EM was used where the displacements of 3 pixels (leftmost, middle, and rightmost) were optimized at every row. The displacements of the other pixels were given by linear interpolation.

Figure 3 shows the patterns R deformed by the first three eigen-deformations u1, u2, and u<sup>3</sup> with the amplification with *k* <sup>√</sup>*λ<sup>m</sup>* (*<sup>k</sup>* <sup>=</sup> <sup>−</sup>2, <sup>−</sup>1, 0, 1, 2), where *<sup>λ</sup><sup>m</sup>* is the eigenvalue of the *m*th eigenvector. This figure also show that frequent deformations were extracted as the eigen-deformation at each category.

Figure 4 shows the cumulative proportion of each category. The cumulative proportion by the top *M* eigen-deformations is defined as *ρ*(*M*) = ∑*<sup>M</sup> <sup>m</sup>*=<sup>1</sup> *<sup>λ</sup>m*/ <sup>∑</sup><sup>74</sup> *<sup>m</sup>*=<sup>1</sup> *λm*. In all categories, the cumulative proportion exceeded 50% with the top 3 ∼ 5 eigen-deformations and 80% with the top 10 ∼ 20 eigen-deformations. Thus, the distribution of deformation vectors was not isotropic and can be approximated by a small number of eigen-deformations. In other words, there existed a low-dimensional and efficient subspace of deformations.

#### **2.3 Recognition with eigen-deformations (1)**

The eigen-deformations can be utilized for recognizing handwritten character images. A direct use of the eigen-deformations for evaluating a distance between two characters R and E is as follows:

$$D\_{\rm disp}(\mathbf{R}, \mathbf{E}) = (\boldsymbol{\upsilon} - \overline{\boldsymbol{\upsilon}})^T \boldsymbol{\Sigma}^{-1} (\boldsymbol{\upsilon} - \overline{\boldsymbol{\upsilon}}) = \sum\_{m=1}^{2I^2} \frac{1}{\lambda\_m} \left< \boldsymbol{\upsilon} - \overline{\boldsymbol{\upsilon}}, \boldsymbol{\mu}\_m \right>^2,\tag{3}$$

where E is an unknown input image and v is the deformation extracted by the elastic matching between R and E. This is the well-known Mahalanobis distance and evaluates the statistical divergence of the estimated deformation on E from the deformations which usually appear in the category of R. If the estimated deformation v gives a large distance value, the result of elastic matching between E and R is somewhat abnormal and therefore the category of R will not become a candidate of the correct category of E.

The recognition performance by *D*disp(R, E) alone, however, is not satisfactory. This is because the distance *D*disp(R, E) completely neglects the distance of pixel features. This fact will be certified through an experimental result in 2.5.

An alternative and reasonable choice is the linear combination of the distance in the pixel feature space and the distance in the deformation space (Uchida & Sakoe, 2003b), that is,

$$D\_{\text{hybrid}}(R, E) = (1 - w)D\_{\text{feat}}(R, E) + wD\_{\text{disp}}(R, E), \tag{4}$$

where *D*feat(R, E) is the elastic matching distance in the pixel feature space, i.e.,

$$D\_{\text{feat}}(\mathbf{R}, \mathbf{E}) = \mathsf{J}\_{\mathbf{R}, \mathbf{E}}(\mathbf{F})\_{\text{\textquotedblleft}} \tag{5}$$

and *w* is a constant (0 ≤ *w* ≤ 1) to ballance two distances.

In practice, the modified Mahalanobis distance (Kimura et al., 1987) is employed instead of (3). Specifically, the higher-order eigenvalues *λ<sup>m</sup>* (*m* = *M* + 2, . . . , 2*I*2) are replaced by

averaged computation time (ms)

The minimization problem (8) with respect to *α* is hard to solve directly. This is because the *M*-dimensional parameter vector *α* to be optimized is involved in the nonlinear function R.

In Uchida & Sakoe (2003a), the approximation scheme used in the tangent distance method (Simard et al., 1992) has been employed for the above minimization problem. As shown in Fig. 5, the minimum distance min*<sup>α</sup> JR*,*E*(*α*) can be approximated by the following *tangent*

where T*<sup>α</sup>* is the tangent plane of the manifold at *α* = 0. The tangent plane is an *M*-dimensional hyperplane in the feature space and linear with respect to *α*. Thus the minimization problem of (9) has a closed-form solution. Intuitively speaking, the distance *D*TD(R, E) is the Euclidean distance between the input E and its closest point on the tangent plane. Figure 6 shows three

Figure 7 shows results of a handwritten character recognition experiment using 26 (categories) × 1,100 (samples) isolated handwritten English uppercase character images from the standard character image database ETL6. The first 100 samples of each category were simply averaged to create one reference pattern R and the next 500 samples were used as training samples E*n* to estimate the eigen-deformations. The remaining 500 samples (13, 000 = 26 × 500 samples

The highest recognition rate (99.47%) was attained by *D*hybrid with its best weight *w*. The recognition rate by *D*disp, i.e., the recognition rate by evaluating only the deformation v, was not sufficient. Thus, the pixel features (i.e., appearance features) should not be neglected for evaluating the distance of two character images. The recognition rates by *D*TD were saturated around *M* = 3. This result is supported by the fast saturation of the cumulative proportion of

0.01 0.1 1 10 100 1000

50

Statistical Deformation Model for Handwritten Character Recognition 7

DTD

6 10 <sup>20</sup> <sup>5</sup>

Dfeat

Ddisp (93.6%)

*<sup>D</sup>*TD(R, <sup>E</sup>) = min*<sup>α</sup>* �T*<sup>α</sup>* <sup>−</sup> <sup>E</sup>� , (9)

Dhybrid

98.0

98.2

98.4

98.6

98.8

M=1

rigid matching

Fig. 7. Relation between computation time (ms) and recognition rate (%).

Thus, some approximation is required to solve the optimization problem.

tangent vectors which span the tangent plane of the category "A".

2 3

recognition rate (%)

*distance*,

Fig. 4.

**2.5 Recognition result**

in total) were used as test samples E.

99.0

99.2

99.4

99.6

Fig. 5. Manifold R*α*, its tangent plane T*α*, and tangent distance *D*TD(R, E).

Fig. 6. Tangent vectors of the category "A", derived from R and eigen-deformations u1, u2, and u3.

*λM*+1, to suppress the estimation errors of higher-order eigenvalues in (3). According to this replacement, (3) is reduced to

$$D\_{\rm disp}(\mathbf{R}, \mathbf{E}) \sim \frac{1}{\lambda\_{M+1}} \|\mathbf{v} - \overline{\mathbf{v}}\| + \sum\_{m=1}^{M} \left(\frac{1}{\lambda\_m} - \frac{1}{\lambda\_{M+1}}\right) \langle \mathbf{v} - \overline{\mathbf{v}}, \mathbf{u}\_m \rangle^2. \tag{6}$$

The parameter *M* is to be determined experimentally, for example, considering the cumulative proportion *ρ*(*M*).

#### **2.4 Recognition with eigen-deformations (2)**

The above recognition method has a weak-point that two heterogeneous distances *D*feat and *D*disp are added naively to create the single distance *D*hybrid. In contrast, the following method (Uchida & Sakoe, 2003a) can avoid this weak-point by embedding the eigen-deformations into an elastic matching procedure.

Consider that the mapping F is defined as a linear combination of eigen-deformations, i.e.,

$$F(\boldsymbol{\alpha}) = \sum\_{m=1}^{M} \alpha\_{m} \boldsymbol{u}\_{m} \tag{7}$$

where *α* = (*α*1,..., *αm*,..., *αM*)*T*. Then an elastic matching problem with F (*α*) can be formulated as the minimization problem of the following objective function:

$$J\_{\mathbf{R},\mathbf{E}}(a) = \left\| R\_{\mathbf{F}(a)} - \mathbf{E} \right\|\_{\text{\textquotedblleft}} \tag{8}$$

where R*<sup>F</sup>* (*α*) is the reference pattern deformed by the mapping F (*α*).

The set of deformed reference patterns, {R*<sup>F</sup>* (*α*)|∀*α*}, will form an *M*-dimensional manifold in an (*I*<sup>2</sup> · *<sup>d</sup>*)-dimensional pixel feature space. Thus the minimum value of *<sup>J</sup>R*,*E*(*α*) is equivalent to the shortest distance between the *M*-dimensional manifold and E.

6 Will-be-set-by-IN-TECH

*E*

Fig. 6. Tangent vectors of the category "A", derived from R and eigen-deformations u1, u2,

*λM*+1, to suppress the estimation errors of higher-order eigenvalues in (3). According to this

*M* ∑ *m*=1

The parameter *M* is to be determined experimentally, for example, considering the cumulative

The above recognition method has a weak-point that two heterogeneous distances *D*feat and *D*disp are added naively to create the single distance *D*hybrid. In contrast, the following method (Uchida & Sakoe, 2003a) can avoid this weak-point by embedding the eigen-deformations into

Consider that the mapping F is defined as a linear combination of eigen-deformations, i.e.,

*M* ∑ *m*=1

where *α* = (*α*1,..., *αm*,..., *αM*)*T*. Then an elastic matching problem with F (*α*) can be

 

The set of deformed reference patterns, {R*<sup>F</sup>* (*α*)|∀*α*}, will form an *M*-dimensional manifold in an (*I*<sup>2</sup> · *<sup>d</sup>*)-dimensional pixel feature space. Thus the minimum value of *<sup>J</sup>R*,*E*(*α*) is equivalent

R*<sup>F</sup>* (*α*) − E

 

F (*α*) =

formulated as the minimization problem of the following objective function:

*JR*,*E*(*α*) =

where R*<sup>F</sup>* (*α*) is the reference pattern deformed by the mapping F (*α*).

to the shortest distance between the *M*-dimensional manifold and E.

 1 *λm*

<sup>−</sup> <sup>1</sup> *λM*+<sup>1</sup> �v − v, u*m*�

*αm*u*m*, (7)

, (8)

2. (6)

�v − v� +

manifold {*RF*(<sup>α</sup>)}

tangent plane *T*α tangent distance *D*TD(*R*, *E*)

+α

−α

and u3.

replacement, (3) is reduced to

an elastic matching procedure.

proportion *ρ*(*M*).

*<sup>D</sup>*disp(R, <sup>E</sup>) <sup>∼</sup> <sup>1</sup>

**2.4 Recognition with eigen-deformations (2)**

*λM*+<sup>1</sup>

*R*

... −α +α

Fig. 5. Manifold R*α*, its tangent plane T*α*, and tangent distance *D*TD(R, E).

Fig. 7. Relation between computation time (ms) and recognition rate (%).

The minimization problem (8) with respect to *α* is hard to solve directly. This is because the *M*-dimensional parameter vector *α* to be optimized is involved in the nonlinear function R. Thus, some approximation is required to solve the optimization problem.

In Uchida & Sakoe (2003a), the approximation scheme used in the tangent distance method (Simard et al., 1992) has been employed for the above minimization problem. As shown in Fig. 5, the minimum distance min*<sup>α</sup> JR*,*E*(*α*) can be approximated by the following *tangent distance*,

$$D\_{\rm TD}(R, E) = \min\_{\boldsymbol{\alpha}} \left\| \boldsymbol{T}\_{\boldsymbol{\alpha}} - E \right\|\_{\boldsymbol{\alpha}} \tag{9}$$

where T*<sup>α</sup>* is the tangent plane of the manifold at *α* = 0. The tangent plane is an *M*-dimensional hyperplane in the feature space and linear with respect to *α*. Thus the minimization problem of (9) has a closed-form solution. Intuitively speaking, the distance *D*TD(R, E) is the Euclidean distance between the input E and its closest point on the tangent plane. Figure 6 shows three tangent vectors which span the tangent plane of the category "A".

#### **2.5 Recognition result**

Figure 7 shows results of a handwritten character recognition experiment using 26 (categories) × 1,100 (samples) isolated handwritten English uppercase character images from the standard character image database ETL6. The first 100 samples of each category were simply averaged to create one reference pattern R and the next 500 samples were used as training samples E*n* to estimate the eigen-deformations. The remaining 500 samples (13, 000 = 26 × 500 samples in total) were used as test samples E.

The highest recognition rate (99.47%) was attained by *D*hybrid with its best weight *w*. The recognition rate by *D*disp, i.e., the recognition rate by evaluating only the deformation v, was not sufficient. Thus, the pixel features (i.e., appearance features) should not be neglected for evaluating the distance of two character images. The recognition rates by *D*TD were saturated around *M* = 3. This result is supported by the fast saturation of the cumulative proportion of Fig. 4.

time

. This property is very important to apply various

*T* .

<sup>√</sup>*λm*u*<sup>m</sup>* (*<sup>m</sup>* <sup>=</sup> 1, 2) (Mitoma et al., 2005). That is, those

) are often assumed to regularize F .

*<sup>T</sup>* . (11)

time

*I*

*I'*

*i*

Statistical Deformation Model for Handwritten Character Recognition 9

constraints (such as the monotonicity and continuity constraint defined as *xi* − *xi*−<sup>1</sup> ∈

This constrained minimization problem can be solved effectively by a DP algorithm, called

The deformation of E from R is represented by the following (*I* · *d*)-dimensional deformation

v = (e*x*<sup>1</sup> − r1,..., e*xi* − r*i*,..., e*xI* − r*I*)

v = (1 − *x*1,..., *i* − *xi*,..., *I* − *xI*)

r*<sup>i</sup>* and e*<sup>x</sup>* are often spatial features and thus their difference represents a deformation.

Although this definition is a straightforward modification of the deformation vector of (2), we will use v of (11) as a deformation vector here. This is because in online character recognition,

Eigen-deformations of online handwritten character patterns are also estimated by the procedure of 2.2; that is, they can be estimated as dominant eigen-vectors of the covariance

Eigen-deformations of online handwritten digits were estimated by using about 1,000 samples from UNIPEN Train-R01/V07 database (1a) (Guyon et al., 1994). Figure 9 shows character

patterns are reference patterns deformed by their mean deformation vector v and the first two eigen-deformations u*m*. Note that the effect of v was not significant because R was set around the center of the set of the training samples by a clustering technique and thus the

Figure 9 shows that deformations frequently observed in actual characters were estimated as eigen-deformations. For example, the first eigen-deformation of "6" represents the vertical variation of its loop part, and the second one represents the horizontal variation of the loop

It should be noted that the dimension of the above deformation vector v is fixed at (*I* · *d*)

mapping *F*

*x*

Fig. 8. Elastic matching between two online handwritten character patterns.

dynamic time warping or DP matching, and its detail are omitted here.

*ri*

*ex*

{0, 1, 2} and boundary constraints *x*<sup>1</sup> = 1 and *xI* = *I*�

statistical methods, such as PCA, to sequential patterns.

and independent of the length of E, i.e., *I*�

Also note that it is possible to define v as

**3.2 Estimation of eigen-deformations**

patterns generated by R + v ± 2

vector,

matrix of v.

part.

norm of v was small.

#### **2.6 Related work**

The original idea of the eigen-deformations, i.e., principal components of deformations, can be found in the point distribution models (PDM), which has been proposed by Cootes et al. (1995) and applied to various patterns. Shen & Davatzikos (2000) have introduced an automatic deformation collection scheme into the PDM. PDM for curvilinear patterns has been applied to face recognition (Lanitis et al., 1997), Chinese character recognition (Shi et al., 2003), and hand posture recognition (Ahmed et al., 1997). Uchida & Sakoe (2003b) have extended the PDM to deal with fully 2D deformations and have applied to an elastic matching-based handwritten character recognition system.

Iwai et al. (1997) have applied PCA to interframe motion vector fields obtained by block matching, which can be considered as the simplest elastic matching. Bing et al (2002) have proposed a face expression recognition method based on a subspace of face deformations. Naster et al. (1997) have analyzed a deformation vector extended to deal with the variation of the pixel feature value. Those ideas will be promising for recognizing handwritten character images.

The eigen-deformations are the principal axes spanning a subspace of the 2*I*2-dimensional deformation space. Any point on the subspace represents a deformation F . On the other hands, we can consider a subspace on the (*I*<sup>2</sup> · *<sup>d</sup>*)-dimensional pixel feature space. Any point on the subspace represents an *I* × *I* × *d* image pattern. The axes spanning this subspace are derived as dominant eigen-vectors of the covariance matrix Σ = ∑*n*(E*<sup>n</sup>* − <sup>E</sup>)(E*<sup>n</sup>* <sup>−</sup> <sup>E</sup>)*T*/*N*, where <sup>E</sup> is the mean vector of {E*n*}. There are huge research attempts about the subspace (Oja, 1983). Eigenface (Turk & Pentland, 1991) and parametric eigenspace (Hase et al., 2003; Murase & Nayar, 1994) are famous examples of those attempts. While the subspace derived in the above manner can represent a set of deformed character patterns, the subspace spanned by the eigen-deformations will represent the same set in a more compact manner. Consider a character image R and a set of character images created by translating R. The number of the eigen-deformations estimated from the set is two; one will represent horizontal shift and the other vertical shift. In contrast, the number of the principal eigen-vectors in the pixel feature space will be far larger than two. This superiority will hold for other geometric deformations and thus the subspace of deformations can be a more efficient representation than the subspace of the pixel features.

#### **3. Statistical deformation model of online handwritten character recognition**

#### **3.1 Extraction of deformations by elastic matching**

Consider two online handwritten character patterns, R = r1, r2,..., r*i*,..., r*<sup>I</sup>* and E = e1, e2, ..., e*x*,..., e*I*� . The former is a reference character pattern and the latter is an input character pattern. Their elements r*<sup>i</sup>* and e*<sup>x</sup>* are *d*-dimensional feature vectors representing the features at *i* and *x*; they are often 3-dimensional vectors comprised of *x*-coordinate, *y*-coordinate, and local direction.

Let F denote a 1D-1D mapping from R to E, i.e., F : *i* �→ *x*. Figure 8 depicts F . Elastic matching between R and E is formulated as the minimization of the following objective function with respect to F ,

$$J\_{\mathbf{R},\mathbf{E}}(F) = \|\mathbf{R} - \mathbf{E}\_F\|\_{\text{\textquotedblleft}}\tag{10}$$

where E*<sup>F</sup>* is the character pattern obtained by fitting E to R, i.e., E*<sup>F</sup>* = e*x*<sup>1</sup> ,..., e*xi* ,..., e*xI* , where *xi* represents the *i* − *x* correspondence under F . On the minimization, several 8 Will-be-set-by-IN-TECH

The original idea of the eigen-deformations, i.e., principal components of deformations, can be found in the point distribution models (PDM), which has been proposed by Cootes et al. (1995) and applied to various patterns. Shen & Davatzikos (2000) have introduced an automatic deformation collection scheme into the PDM. PDM for curvilinear patterns has been applied to face recognition (Lanitis et al., 1997), Chinese character recognition (Shi et al., 2003), and hand posture recognition (Ahmed et al., 1997). Uchida & Sakoe (2003b) have extended the PDM to deal with fully 2D deformations and have applied to an elastic

Iwai et al. (1997) have applied PCA to interframe motion vector fields obtained by block matching, which can be considered as the simplest elastic matching. Bing et al (2002) have proposed a face expression recognition method based on a subspace of face deformations. Naster et al. (1997) have analyzed a deformation vector extended to deal with the variation of the pixel feature value. Those ideas will be promising for recognizing handwritten character

The eigen-deformations are the principal axes spanning a subspace of the 2*I*2-dimensional deformation space. Any point on the subspace represents a deformation F . On the other hands, we can consider a subspace on the (*I*<sup>2</sup> · *<sup>d</sup>*)-dimensional pixel feature space. Any point on the subspace represents an *I* × *I* × *d* image pattern. The axes spanning this subspace are derived as dominant eigen-vectors of the covariance matrix Σ = ∑*n*(E*<sup>n</sup>* − <sup>E</sup>)(E*<sup>n</sup>* <sup>−</sup> <sup>E</sup>)*T*/*N*, where <sup>E</sup> is the mean vector of {E*n*}. There are huge research attempts about the subspace (Oja, 1983). Eigenface (Turk & Pentland, 1991) and parametric eigenspace (Hase et al., 2003; Murase & Nayar, 1994) are famous examples of those attempts. While the subspace derived in the above manner can represent a set of deformed character patterns, the subspace spanned by the eigen-deformations will represent the same set in a more compact manner. Consider a character image R and a set of character images created by translating R. The number of the eigen-deformations estimated from the set is two; one will represent horizontal shift and the other vertical shift. In contrast, the number of the principal eigen-vectors in the pixel feature space will be far larger than two. This superiority will hold for other geometric deformations and thus the subspace of deformations can be a

matching-based handwritten character recognition system.

more efficient representation than the subspace of the pixel features.

**3.1 Extraction of deformations by elastic matching**

**3. Statistical deformation model of online handwritten character recognition**

Consider two online handwritten character patterns, R = r1, r2,..., r*i*,..., r*<sup>I</sup>* and E = e1, e2, ..., e*x*,..., e*I*� . The former is a reference character pattern and the latter is an input character pattern. Their elements r*<sup>i</sup>* and e*<sup>x</sup>* are *d*-dimensional feature vectors representing the features at *i* and *x*; they are often 3-dimensional vectors comprised of *x*-coordinate, *y*-coordinate, and

Let F denote a 1D-1D mapping from R to E, i.e., F : *i* �→ *x*. Figure 8 depicts F . Elastic matching between R and E is formulated as the minimization of the following objective

where *xi* represents the *i* − *x* correspondence under F . On the minimization, several

where E*<sup>F</sup>* is the character pattern obtained by fitting E to R, i.e., E*<sup>F</sup>* = e*x*<sup>1</sup> ,..., e*xi*

*JR*,*E*(F ) = �R − E*<sup>F</sup>* � , (10)

,..., e*xI* ,

**2.6 Related work**

images.

local direction.

function with respect to F ,

Fig. 8. Elastic matching between two online handwritten character patterns.

constraints (such as the monotonicity and continuity constraint defined as *xi* − *xi*−<sup>1</sup> ∈ {0, 1, 2} and boundary constraints *x*<sup>1</sup> = 1 and *xI* = *I*� ) are often assumed to regularize F . This constrained minimization problem can be solved effectively by a DP algorithm, called dynamic time warping or DP matching, and its detail are omitted here.

The deformation of E from R is represented by the following (*I* · *d*)-dimensional deformation vector,

$$\mathbf{v} = \begin{pmatrix} \mathbf{e}\_{\mathbf{x}\_{\mathrm{I}}} - \mathbf{r}\_{\mathrm{I}} \dots \mathbf{e}\_{\mathbf{x}\_{\mathrm{I}}} - \mathbf{r}\_{\mathrm{I}} \dots \mathbf{e}\_{\mathbf{x}\_{\mathrm{I}}} - \mathbf{r}\_{\mathrm{I}} \end{pmatrix}^{\mathrm{T}}.\tag{11}$$

It should be noted that the dimension of the above deformation vector v is fixed at (*I* · *d*) and independent of the length of E, i.e., *I*� . This property is very important to apply various statistical methods, such as PCA, to sequential patterns.

Also note that it is possible to define v as

$$w = \begin{pmatrix} 1 - \mathfrak{x}\_{1\prime} \dots \mathfrak{z}\_{\prime} \mathfrak{i} - \mathfrak{x}\_{\dot{\imath}\prime} \dots \mathfrak{z}\_{\dot{\imath}\prime} \mathfrak{l} - \mathfrak{x}\_{\dot{\imath}\prime} \mathfrak{l} \end{pmatrix}^T.$$

Although this definition is a straightforward modification of the deformation vector of (2), we will use v of (11) as a deformation vector here. This is because in online character recognition, r*<sup>i</sup>* and e*<sup>x</sup>* are often spatial features and thus their difference represents a deformation.

#### **3.2 Estimation of eigen-deformations**

Eigen-deformations of online handwritten character patterns are also estimated by the procedure of 2.2; that is, they can be estimated as dominant eigen-vectors of the covariance matrix of v.

Eigen-deformations of online handwritten digits were estimated by using about 1,000 samples from UNIPEN Train-R01/V07 database (1a) (Guyon et al., 1994). Figure 9 shows character patterns generated by R + v ± 2 <sup>√</sup>*λm*u*<sup>m</sup>* (*<sup>m</sup>* <sup>=</sup> 1, 2) (Mitoma et al., 2005). That is, those patterns are reference patterns deformed by their mean deformation vector v and the first two eigen-deformations u*m*. Note that the effect of v was not significant because R was set around the center of the set of the training samples by a clustering technique and thus the norm of v was small.

Figure 9 shows that deformations frequently observed in actual characters were estimated as eigen-deformations. For example, the first eigen-deformation of "6" represents the vertical variation of its loop part, and the second one represents the horizontal variation of the loop part.

As noted 2.3, the estimation errors of higher-order eigenvalues are amplified in (12). Thus, the modified quadratic discriminant function (MQDF) (Kimura et al., 1987) was employed, where

Statistical Deformation Model for Handwritten Character Recognition 11

*M* ∑ *m*=1  1 *λm*

*M* ∏*m*=1 *λm* 

<sup>−</sup> <sup>1</sup> *λM*+<sup>1</sup> �<sup>v</sup> <sup>−</sup> <sup>v</sup>, <sup>u</sup>*m*�<sup>2</sup>

+ (*I* · *d*)log 2*π*. (13)

the higher-order eigenvalues *λ<sup>m</sup>* (*m* = *M* + 1, . . . , *I* · *d*) are replaced by *λM*+1, i.e.,

�<sup>v</sup> <sup>−</sup> <sup>v</sup>�<sup>2</sup> <sup>+</sup>

(*λM*+1)*I*·*d*−*<sup>M</sup>*

Figure 10 shows the results of an online character recognition experiment using digit samples from the UNIPEN database. Recognition rates attained by *D*MQDF are plotted as a function of the total number of reference patterns, which are created by a clustering technique. The recognition rates attained by the conventional DP-matching distance (*D*DP), which equals to

As shown in Fig. 10, MQDF with the eigen-deformations outperformed the DP-matching distance. This will be because elastic matching results F which were deviated from the distribution of the deformations of the category were penalized by the eigen-deformations in MQDF. Thus, the above recognition method can avoid misrecognitions due to overfitting, which is the phenomenon that the distance between E and R of a wrong category is

This result also proves that *D*MQDF outperforms that statistical dynamic time warping (SDTW) (Bahlmann & Burkhardt, 2004), which is a recent and sophisticated online character recognition technique. In fact, it has been reported in Bahlmann & Burkhardt (2004) that

Sequential patterns, such as online handwritten character patterns, are often re-sampled to have the same dimension in advance to applying PCA or other statistical analysis techniques. For example, Deepu et al. (2004) have proposed an online character recognition technique based on a subspace method where all online character patterns are re-sampled to have a constant number of data points. The online character recognition technique by Zheng et al. (1999) is more radical because they used only two points (i.e., the start point and the end point) for each character stroke segment. In the handwriting synthesis technique by Wang et al. (2005), online cursive handwritings are firstly aligned to be the same dimension and then PCA is applied to them. PCA-based gesture/motion analysis techniques (Fod et al., 2002; Sanger, 1995; Yacoob & Black, 1999) also re-sampled gesture patterns to have the same dimension. An exception is Martens & Claesen (1996), which employed elastic matching to extract a

Statistical deformation models of handwritten character images and online handwritten character patterns have been introduced. The body of those models are eigen-deformations,

SDTW attained 97.10% on the same UNIPEN data set by 150 reference patterns.

fixed-dimensional deformation vector from online signatures.

*λM*+<sup>1</sup>

<sup>+</sup> log

*<sup>D</sup>*MQDF(R*c*, <sup>E</sup>) <sup>∼</sup> <sup>1</sup>

the minimum value of (10), are also plotted.

underestimated by unnatural mapping F .

**3.4 Recognition results**

**3.5 Related work**

**4. Conclusion**

The parameter *M* is to be determined experimentally.

Fig. 9. Reference character pattern deformed by the first two eigen-deformations of "2" and "6".

Fig. 10. Accuracy of online character recognition based on eigen-deformations.

#### **3.3 Recognition with eigen-deformations**

For online handwritten character recognition based on the eigen-deformations, the following quadratic discrimination function (QDF) is a possible choice (Mitoma et al., 2005). The QDF is the Bayes discrimination function under the assumption that the deformation vectors have a Gaussian distribution and defined as

$$\begin{split} D\_{\text{QDF}}(\mathbf{R}, \mathbf{E}) &= (\boldsymbol{\upsilon} - \overline{\boldsymbol{\upsilon}})^T \boldsymbol{\Sigma}^{-1} (\boldsymbol{\upsilon} - \overline{\boldsymbol{\upsilon}}) + \log|\boldsymbol{\Sigma}| + (\boldsymbol{I} \cdot \boldsymbol{d}) \log 2\pi \\ &= \sum\_{m=1}^{I \cdot d} \frac{1}{\lambda\_m} \langle \boldsymbol{\upsilon} - \overline{\boldsymbol{\upsilon}}, \boldsymbol{u}\_m \rangle^2 + \log \prod\_{m=1}^{I \cdot d} \lambda\_m + (I \cdot \boldsymbol{d}) \log 2\pi. \end{split} \tag{12}$$

The last term, (*I* · *d*)log 2*π*, cannot be omitted here because each category has a different dimension of v (i.e., *I* · *d*).

As noted 2.3, the estimation errors of higher-order eigenvalues are amplified in (12). Thus, the modified quadratic discriminant function (MQDF) (Kimura et al., 1987) was employed, where the higher-order eigenvalues *λ<sup>m</sup>* (*m* = *M* + 1, . . . , *I* · *d*) are replaced by *λM*+1, i.e.,

$$\begin{split} D\_{\text{MQCD}}(\mathbf{R}\_{\text{c}},\mathbf{E}) \sim \frac{1}{\lambda\_{M+1}} \left\| \boldsymbol{\upsilon} - \boldsymbol{\mathsf{w}} \right\|^{2} + \sum\_{m=1}^{M} \left( \frac{1}{\lambda\_{m}} - \frac{1}{\lambda\_{M+1}} \right) \left\langle \boldsymbol{\upsilon} - \boldsymbol{\mathsf{w}}, \boldsymbol{\mathsf{u}}\_{m} \right\rangle^{2} \\ + \log \left\{ (\lambda\_{M+1})^{I \cdot d - M} \prod\_{m=1}^{M} \lambda\_{m} \right\} + (I \cdot d) \log 2\pi. \end{split} \tag{13}$$

The parameter *M* is to be determined experimentally.

#### **3.4 Recognition results**

10 Will-be-set-by-IN-TECH

1st eigen-deformation 2nd eigen-deformation

50 100 150 200

For online handwritten character recognition based on the eigen-deformations, the following quadratic discrimination function (QDF) is a possible choice (Mitoma et al., 2005). The QDF is the Bayes discrimination function under the assumption that the deformation vectors have

*<sup>D</sup>*QDF(R, <sup>E</sup>)=(<sup>v</sup> <sup>−</sup> <sup>v</sup>)*T*Σ−1(<sup>v</sup> <sup>−</sup> <sup>v</sup>) + log <sup>|</sup>Σ<sup>|</sup> + (*<sup>I</sup>* · *<sup>d</sup>*)log 2*<sup>π</sup>*

�<sup>v</sup> <sup>−</sup> <sup>v</sup>, <sup>u</sup>*m*�<sup>2</sup> <sup>+</sup> log

The last term, (*I* · *d*)log 2*π*, cannot be omitted here because each category has a different

*I*·*d* ∏*m*=1

*λ<sup>m</sup>* + (*I* · *d*)log 2*π*. (12)

Fig. 10. Accuracy of online character recognition based on eigen-deformations.

DMQDF DDP

#reference patterns

Fig. 9. Reference character pattern deformed by the first two eigen-deformations of "2" and

"6".

88

**3.3 Recognition with eigen-deformations**

a Gaussian distribution and defined as

dimension of v (i.e., *I* · *d*).

= *I*·*d* ∑ *m*=1

1 *λm*

90

92

94

recognition rate (%)

96

98

reference eigen-def.

reference

reference + eigen-def.

> Figure 10 shows the results of an online character recognition experiment using digit samples from the UNIPEN database. Recognition rates attained by *D*MQDF are plotted as a function of the total number of reference patterns, which are created by a clustering technique. The recognition rates attained by the conventional DP-matching distance (*D*DP), which equals to the minimum value of (10), are also plotted.

> As shown in Fig. 10, MQDF with the eigen-deformations outperformed the DP-matching distance. This will be because elastic matching results F which were deviated from the distribution of the deformations of the category were penalized by the eigen-deformations in MQDF. Thus, the above recognition method can avoid misrecognitions due to overfitting, which is the phenomenon that the distance between E and R of a wrong category is underestimated by unnatural mapping F .

> This result also proves that *D*MQDF outperforms that statistical dynamic time warping (SDTW) (Bahlmann & Burkhardt, 2004), which is a recent and sophisticated online character recognition technique. In fact, it has been reported in Bahlmann & Burkhardt (2004) that SDTW attained 97.10% on the same UNIPEN data set by 150 reference patterns.

#### **3.5 Related work**

Sequential patterns, such as online handwritten character patterns, are often re-sampled to have the same dimension in advance to applying PCA or other statistical analysis techniques. For example, Deepu et al. (2004) have proposed an online character recognition technique based on a subspace method where all online character patterns are re-sampled to have a constant number of data points. The online character recognition technique by Zheng et al. (1999) is more radical because they used only two points (i.e., the start point and the end point) for each character stroke segment. In the handwriting synthesis technique by Wang et al. (2005), online cursive handwritings are firstly aligned to be the same dimension and then PCA is applied to them. PCA-based gesture/motion analysis techniques (Fod et al., 2002; Sanger, 1995; Yacoob & Black, 1999) also re-sampled gesture patterns to have the same dimension. An exception is Martens & Claesen (1996), which employed elastic matching to extract a fixed-dimensional deformation vector from online signatures.

#### **4. Conclusion**

Statistical deformation models of handwritten character images and online handwritten character patterns have been introduced. The body of those models are eigen-deformations,

Kimura, F.; Takashina, K. & Tsuruoka, S. (1987). Modified quadratic discriminant functions

Statistical Deformation Model for Handwritten Character Recognition 13

Kuo, S. S. & Agazzi, O. E. (1994). Keyword spotting in poorly printed documents using pseudo 2-D hidden Markov models, *IEEE Trans. PAMI*, Vol. 16, No. 8, pp. 842–848. Lanitis, A.; Taylor, C. J. & Cootes, T. F. (1997). Automatic interpretation and coding of face images using flexible models, *IEEE Trans. PAMI*, Vol. 19, No. 7, pp. 743–756. Martens, R. & Claesen, L. (1996). On-line signature verification by dynamic time-warping, In:

Mitoma, H.; Uchida, S. & Sakoe, H. (2005). Online character recognition based on elastic matching and quadratic discrimination, In: *Proc. ICDAR*, Vol. 1 of 2, pp.36–40. Murase, H. & Nayar, S. K. (1994). Illumination planning for object recognition using parametric eigenspace, *IEEE Trans. PAMI*, Vol. 16, No. 12, pp. 1219–1227. Nag, R.; Wong,K. H. & F. Fallside. (1986). Script recognition using hidden Markov models, In:

Nakai, M.; Akira, N.; Shimodaira, H. & Sagayama S. (2001). Substroke approach to HMM-based on-line Kanji handwriting recognition, In: *Proc. ICDAR*, pp. 491–495. Naster, C.; Moghaddam, B. & Pentland, A. (1997). Flexible images: matching and recognition using learned deformations, *Comput. Vis. Image Und.*, Vol. 65, No. 2, pp. 179–191.

Oja, E. (1983). *Subspace Methods of Pattern Recognition*, Research Studies Press and J. Wiley. H. -S. Park & S. -W. Lee. (1998). A truly 2-D hidden Markov model for off-line handwritten character recognition, *Pattern Recognit.*, Vol. 31, No. 12, pp. 1849–1864. Sanger, T. D. (1995). Optimal movement primitives, *Advances in Neural Info. Proc. Systems*,

Shen D. & Davatzikos, C. (2000). An adaptive-focus deformable model using statistical and geometric information, *IEEE Trans. PAMI*, Vol. 22, No. 8, pp. 906-913. Shi, D.; Gunn, S. R. & Damper, R. I. (2003). Handwritten Chinese radical recognition using nonlinear active shape models, *IEEE Trans. PAMI*, Vol. 25, No. 2, pp. 277–280. Simard, P.; Le Cun, Y.; Denker, J. & Victorri, B. (1992). An efficient algorithm for learning

Turk, M. & Pentland, A. (1991). "Eigenfaces for recognition," *Journal of Cognitive Neuroscience*,

Uchida, S. & Sakoe, H. (2003). Handwritten character recognition using elastic matching based on a class-dependent deformation model, In: *Proc. ICDAR*, Vol. 1 of 2, pp. 163–167. Uchida, S. & Sakoe, H. (2003). Eigen-deformations for elastic matching based handwritten character recognition, *Pattern Recognit.*, Vol. 36, No. 9, pp. 2031–2040. Uchida, S. & Sakoe, H. (2005). A survey of elastic matching techniques for handwritten character recognition, *IEICE Trans. Inf. & Syst.*, Vol. E88-D, No. 8, pp. 1781–1790. Wakahara, T. (1994). Shape matching using LAT and its application to handwritten numeral

Wakahara, T. & Odaka, K. (1997). On-line cursive Kanji character recognition using

Wakahara, T.; Kimura, Y. & A. Tomono. (2001). Affine-invariant recognition of gray-scale

stroke-based affine transformation, *IEEE Trans. PAMI*, Vol. 19, No. 12, pp. 1381–1385.

characters using global affine transformation correlation, *IEEE Trans. PAMI*, Vol. 23,

invariances in adaptive classifier, In: *Proc. ICPR*, Vol. 2, pp. 651–655.

recognition, *IEEE Trans. PAMI*, Vol. 16, No. 6, pp. 618–629.

pp. 149-153.

*Proc. ICPR*, pp. 38–42.

Vol. 7, pp. 1023–30.

Vol. 3, No. 1, pp. 71–86.

No. 4, pp. 384–395.

*Proc. ICASSP*, Vol. 3, pp. 2071–2074.

and the application to Chinese character recognition, *IEEE Trans. PAMI*, Vol. 9, No. 1,

which are deformations frequently observed in a certain category and span a subspace in a deformation space of the category. For estimating the eigen-deformations, elastic matching and principal component analysis (PCA) were employed. The former was utilized to extract deformations of target patterns automatically. For the online patterns, elastic matching was also utilized to adjust difference in their lengths. The latter was utilized to derive the eigen-deformations as the principal components of the extracted deformations.

The usefulness of the statistical deformation models with eigen-deformations has been confirmed experimentally. The estimated eigen-deformations could represent frequently observed deformations in each character category. In addition, the eigen-deformations were useful for improving accuracy in both of offline and online character recognition tasks.

#### **5. References**


12 Will-be-set-by-IN-TECH

which are deformations frequently observed in a certain category and span a subspace in a deformation space of the category. For estimating the eigen-deformations, elastic matching and principal component analysis (PCA) were employed. The former was utilized to extract deformations of target patterns automatically. For the online patterns, elastic matching was also utilized to adjust difference in their lengths. The latter was utilized to derive the

The usefulness of the statistical deformation models with eigen-deformations has been confirmed experimentally. The estimated eigen-deformations could represent frequently observed deformations in each character category. In addition, the eigen-deformations were useful for improving accuracy in both of offline and online character recognition tasks.

Ahmad, T.; Taylor, C. J.; Lanitis, A. & Cootes, T. F. (1997). Tracking and recognising hand gestures, using statistical shape models *Image Vis. Computing*, Vol. 15, pp. 345–352. Bahlmann, C. & Burkhardt, H.. (2004). The writer independent online handwriting recognition

Bing, Y.; Ping, C. & Lianfu, J. (2002). Recognizing faces with expressions: within-class space

Burr, D. J. (1983). Designing a handwriting reader, *IEEE Trans. PAMI*, Vol. PAMI-5, No. 5,

Cho, W.; Lee, S. -W. & Kim, J. H. (1995). Modeling and recognition of cursive words with hidden Markov models, *Pattern Recognit.*, Vol. 28, No. 12, pp. 1941–1953. Connell, S. D. & Jain,A. K. (2001). Template-based online character recognition, *Pattern*

Cootes, T. F.; Taylor, C. J.; Cooper, D. H. & Graham, J. (1995). Active shape models - their training and application, *Comput. Vis. Image Und.*, Vol. 61, No. 1, pp. 38–59. Deepu, V.; Madhvanath, S. & Ramakrishnan, A. G. (2004). Principal component analysis for online handwritten character recognition, In: *Proc. ICPR*, Vol. 2 of 4 , pp. 327–330. Fod, A.; Mataric, M. & Jenkins, O. C. (2002). Automated derivation of primitives for movement

Fujimoto, Y.; Kadota, S.; Hayashi, S.; Yamamoto, M.; Yajima, S. & Yasuda, M. (1976).

Guyon, I.; Schomaker, L.; Plamondon, R.; Liberman, M. & Janet, S. (1994). UNIPEN project of on-line data exchange and recognizer benchmarks, In: *Proc. ICPR*, pp. 29–33. Hase, H.; Shinokawa, T.; Yoneda, M. & Suen, C. Y. (2003). Recognition of rotated characters by

Hu, J.; Brown, M. K. & Turin, W. (1996). HMM based on-line handwriting recognition, *IEEE*

Iwai, Y.; Hata, T. & Yachida, M. (1997). Gesture recognition based on subspace method and

Keysers, D.; Gollan, C. & H. Ney. (2004) . Local context in non-linear deformation models for handwritten character recognition, In: *Proc. ICPR*, Vol. 4, pp. 511–514.

hidden Markov model, In: *Proc. IROS*, Vol. 2 of 2, pp. 960–966.

Recognition of handprinted characters by nonlinear elastic matching, In: *Proc. ICPR*,

and between-class space, In: *Proc. ICPR*, Vol. 1 of 4, pp. 139–142.

classification, *Autonomous Robots*, Vol. 12, No. 1, pp. 39–54.

eigen-space, In: *Proc. ICDAR*, Vol. 2, pp. 731–735.

*Trans. PAMI*, Vol. 18, No. 10, pp. 1039–1045.

system flog on hand and cluster generative statistical dynamic time warping, *IEEE*

eigen-deformations as the principal components of the extracted deformations.

*Trans. PAMI*, Vol. 26, No. 3, pp. 299–310.

*Recognit.*, Vol. 34, No. 1, pp. 1–14.

**5. References**

pp. 554–559.

pp. 113–118.


**0**

**2**

*Poland*

Bartłomiej Starosta

**Character Recognition with Metasets**

The chapter presents a new approach to the character recognition problem. It is based on metasets – a new concept of sets with partial membership relation. By the character recognition problem we understand determining the similarity degree of the given character sample to the defined character pattern. The discussed mechanism may be applied not only to characters (e.g. letters), but to arbitrary data represented on monochromatic images or even

The theory of metasets brings a new model of "fuzzy" membership relation for sets. A metaset may be a member of (or equal to) another metaset to variety of different degrees – contrary to

The goal of the chapter is to present the application of the new, abstract theory to solving a practical, well-known problem. It develops the method which was partially introduced for some particular case in (Starosta, 2009). The proposed solution had been implemented as a computer program. The experiments made with the program confirm that the theoretical assumptions are correct and the obtained results properly reflect our perception of similarity of characters. It should also be stressed that the concept of metaset itself was partially inspired by another computer application for character recognition, based on neural networks.

The process of determining the similarity degree consists in two stages. Initially, the compound character pattern must be prepared. It consists of several character samples accompanied by quality grades. The samples are depicted on rectangular matrices and they correspond to different forms of the same character. The pattern itself represents various possible approaches to the same character, as a single entity. In the second stage a testing character sample is matched against the pattern and the resulting similarity degree is

The character samples as well as the compound pattern are encoded as metasets. As the result of matching the testing sample against the pattern we obtain the membership degree of the sample metaset in the pattern metaset and additionally, the sequence of equality degrees of the sample metaset and the pattern elements. The membership degree measures how far the sample resembles the pattern. The equality degrees indicate the similarity of the input sample and each pattern element separately. The membership degrees as well as equality degrees for metasets are expressed as sets of nodes of the binary tree, which are finite binary sequences,

classical sets where membership and equality are always either true or false.

**1. Introduction**

multi-dimensional figures.

**1.1 The general idea**

and they may be evaluated as real numbers.

calculated.

*Polish-Japanese Institute of Information Technology*

