**Character Recognition with Metasets**

#### Bartłomiej Starosta

*Polish-Japanese Institute of Information Technology Poland*

#### **1. Introduction**

14 Will-be-set-by-IN-TECH

14 Recent Advances in Document Recognition and Understanding

Wang, J.; Wu, C.; Xu, Y.-Q. & Shum, H.-Y. (2005). Combining shape and physical models

Yacoob, Y. & Black, M. (1999). Parameterized modeling and recognition of activities, *Comput.*

Yoshida, K & Sakoe, H. (1982). Online handwritten character recognition for a personal computer system, *IEEE Trans. Consumer Electronics*, Vol. CE-28, No. 3, pp. 202–209. Zheng, J.; Ding, X.; Wu, Y. & Lu, Z. (1999). Spatio-temporal unified model for on-line handwritten Chinese character recognition, In: *Proc. ICDAR*, pp. 649–652.

pp. 219–227.

*Vis. Image Und.*, Vol. 73, No. 2, pp. 232–247.

for online cursive handwriting synthesis, *Int. J. Doc. Ana. Recog.*, Vol. 7, No. 4,

The chapter presents a new approach to the character recognition problem. It is based on metasets – a new concept of sets with partial membership relation. By the character recognition problem we understand determining the similarity degree of the given character sample to the defined character pattern. The discussed mechanism may be applied not only to characters (e.g. letters), but to arbitrary data represented on monochromatic images or even multi-dimensional figures.

The theory of metasets brings a new model of "fuzzy" membership relation for sets. A metaset may be a member of (or equal to) another metaset to variety of different degrees – contrary to classical sets where membership and equality are always either true or false.

The goal of the chapter is to present the application of the new, abstract theory to solving a practical, well-known problem. It develops the method which was partially introduced for some particular case in (Starosta, 2009). The proposed solution had been implemented as a computer program. The experiments made with the program confirm that the theoretical assumptions are correct and the obtained results properly reflect our perception of similarity of characters. It should also be stressed that the concept of metaset itself was partially inspired by another computer application for character recognition, based on neural networks.

#### **1.1 The general idea**

The process of determining the similarity degree consists in two stages. Initially, the compound character pattern must be prepared. It consists of several character samples accompanied by quality grades. The samples are depicted on rectangular matrices and they correspond to different forms of the same character. The pattern itself represents various possible approaches to the same character, as a single entity. In the second stage a testing character sample is matched against the pattern and the resulting similarity degree is calculated.

The character samples as well as the compound pattern are encoded as metasets. As the result of matching the testing sample against the pattern we obtain the membership degree of the sample metaset in the pattern metaset and additionally, the sequence of equality degrees of the sample metaset and the pattern elements. The membership degree measures how far the sample resembles the pattern. The equality degrees indicate the similarity of the input sample and each pattern element separately. The membership degrees as well as equality degrees for metasets are expressed as sets of nodes of the binary tree, which are finite binary sequences, and they may be evaluated as real numbers.

**Lemma 1.** *If A* ⊂ *is a maximal finite antichain in , then* <sup>∑</sup>*p*∈*<sup>A</sup>*

each *<sup>p</sup>* �<sup>=</sup> corresponds to an interval *<sup>p</sup>*¯ = [ #*<sup>p</sup>*

*<sup>s</sup>* is comparable to some *<sup>p</sup>* ∈ *<sup>A</sup>*, then *<sup>s</sup>*¯ ∩ *<sup>p</sup>*¯ �= <sup>∅</sup>, so *<sup>s</sup>*¯ ∩

To complete the proof note, that the length of each *p*¯ is <sup>1</sup>

*I* = [0...1). The length of each interval is <sup>1</sup>

We now show, that the measure of

*<sup>s</sup>*¯ ⊂ *<sup>u</sup>*. Thus, assuming that the length of

elements of *A*, what contradicts its maximality.

they are comparable.

is less, then let *<sup>u</sup>* ⊂ *<sup>I</sup>* \

they are all pairwise disjoint.

discussed here mechanism.

**2.1 Fundamental concepts**

are called its *potential elements*.

**2. Metasets**

*Proof.* Each node *p* �= is a binary sequence which represents a natural number #*p*. Therefore,

Character Recognition with Metasets 17

intervals are disjoint: *p*¯ ∩ *q*¯ = ∅. Indeed, if *p*¯ ∩ *q*¯ �= ∅, then there must exist some *r* ∈ such, that *r*¯ ⊂ *p*¯ ∩ *q*¯. Since *r*¯ ⊂ *p*¯, then *r* ≤ *p*, and similarly *r* ≤ *q*. This implies *p* ≤ *q* or *q* ≤ *p*, so

In the classical set theory a set either is an element of another set or it is not; there are no intermediate levels. This binary approach has many vital limitations which make it difficult to apply by representation of vague, imprecise data. Therefore, for the last decades there were several attempts to inventing a concept of set with partial membership relation. Among the most successful ones are fuzzy sets (Zadeh, 1965), intuitionistic fuzzy sets (Atanassov, 1986)

One of the most significant characteristics of the metaset concept is its computer oriented design. Definitions of fundamental notions – like membership, equality or algebraic operations – may be formulated in the way which makes them easily implementable using programming languages (Starosta & Kosi ´nski, 2009). This facilitates fast and efficient computer representation and processing of vague data. Additionally, several important theoretical results may be obtained for the metasets which are representable in computers, because of their finite structure. Some of them – like the Lemma 3 – constitute the base for the

The concept of metaset is strictly based on the classical Zermelo-Fraenkel set theory (ZFC). We define metaset as a set of ordered pairs. The first element of a pair is a member of the metaset, which is another metaset. The second element of the pair is a node of the binary tree which – informally speaking – specifies the membership degree of the first element in the metaset. **Definition 1.** A metaset is a crisp set which is either the empty set ∅ or which has the form:

*τ* = { �*σ*, *p*� : *σ* is a metaset, *p* ∈ } . The definition is recursive, however it is founded by the empty set ∅, by the Axiom of Foundation in ZFC (Kunen, 1980). First elements of ordered pairs contained in the metaset

and rough sets (Pawlak, 1982). The metaset idea is a new approach to the problem.

<sup>2</sup>|*p*<sup>|</sup> ... #*p*+<sup>1</sup>

1 <sup>2</sup>|*p*<sup>|</sup> = <sup>1</sup>*.*

*<sup>p</sup>*∈*<sup>A</sup> <sup>p</sup>*¯ is equal 1. Clearly, it cannot be grater than 1, so if it

*<sup>p</sup>*∈*<sup>A</sup> <sup>p</sup>*¯ be an open interval. There must exist *<sup>s</sup>* ∈ such, that *<sup>s</sup>*¯ ⊂ *<sup>u</sup>*. If

<sup>2</sup>|*p*<sup>|</sup> . For incomparable *<sup>p</sup>* and *<sup>q</sup>*, the corresponding

*<sup>p</sup>*∈*<sup>A</sup> <sup>p</sup>*¯ is less than 1 we found *<sup>s</sup>* incomparable to all

<sup>2</sup>|*p*<sup>|</sup> , the measure of

<sup>2</sup>|*p*<sup>|</sup> ) ⊂ [0...1] and corresponds to

*<sup>p</sup>*∈*<sup>A</sup> <sup>p</sup>*¯ is non-empty, what contradicts

*<sup>p</sup>*∈*<sup>A</sup> <sup>p</sup>*¯ is 1 and

The quality grades of the samples in the pattern are membership degrees of the corresponding metasets, too. However, they are manually specified as areas of the matrix for depicting the characters, which contain valid pixels to be included in the matching process. This specification is interpreted as membership degrees of appropriate metasets. The quality grades show how close is a particular sample to the ideal. They may be supplied by experts together with the samples.

The most significant innovation here is treating the membership and equality degrees of metasets as similarity measures for characters provided they are properly encoded as metasets.

#### **1.2 Basic terms and notation**

The concept of binary tree plays the key role in the definition of metaset and related notions. Therefore, we start with establishing some well known terms and notation concerning it.

We use the symbol for the infinite binary tree with the root . The nodes of the tree are finite binary sequences, the root is the empty sequence. For *p* ∈ the symbol |*p*| denotes the length of the sequence and #*p* denotes the natural number represented by the binary sequence *p*. Note, that | | = 0 and we assume # = 0. The ordering of nodes in is determined by reverse ordering of their lengths: *p* ≤ *q* whenever |*p*|≥|*q*|. In particular the root is the largest element in . The set of nodes of equal length *n* is called the *n*-th *level* in the tree: *<sup>n</sup>* = { *p* ∈ : |*p*| = *n* }. The level 0 contains only the root. Nodes of the tree are sometimes called *conditions*. If *p* ≤ *q* ∈ , then we say that the condition *p* is *stronger* than the condition *q*, and *q* is *weaker* than *p*. Thus, the conditions 0 and 1 are stronger than the root and they are weaker than the conditions 00, 01, 10, 11, which form the level 2.

Fig. 1. The binary tree and the ordering of nodes (conditions). Arrows point at the larger element, i.e., the weaker condition

A set of nodes *C* ⊂ is called a *chain* in , whenever all its elements are pairwise comparable: ∀*p*,*q*∈*<sup>C</sup>* (*p* ≤ *q* ∨ *q* ≤ *p*). A set *A* ⊂ is called *antichain* in , if it consists of mutually incomparable elements: ∀*p*,*q*∈*<sup>A</sup>* (*<sup>p</sup>* �= *<sup>q</sup>* → ¬ (*<sup>p</sup>* ≤ *<sup>q</sup>*) ∧ ¬ (*<sup>p</sup>* ≥ *<sup>q</sup>*)). On the Fig. 1, the elements { 00, 01, 100 } form a sample antichain. A *maximal antichain* is an antichain which cannot be extended by adding new elements – it is a maximal element with respect to inclusion of antichains. Examples of maximal antichains on the Fig. 1 are { 0, 1 } or { 00, 01, 1 } or even { }. They are in fact maximal finite antichains (MFA). A *branch* is a maximal chain in the tree . Note that *p* is comparable to *q* only, if there exists a branch containing *p* and *q* simultaneously. Similarly, *p* is incomparable to *q*, when no branch contains both *p* and *q*. To finish this section we prove a property of maximal finite antichains necessary for evaluating as numbers the degrees represented as sets of nodes. Clearly, there are 2*<sup>n</sup>* nodes on the *n*-th

level of the binary tree, so <sup>∑</sup>*p*<sup>∈</sup> *<sup>n</sup>* 1 <sup>2</sup>|*p*<sup>|</sup> = 1. This property may be generalized to arbitrary MFA. 2 Will-be-set-by-IN-TECH

The quality grades of the samples in the pattern are membership degrees of the corresponding metasets, too. However, they are manually specified as areas of the matrix for depicting the characters, which contain valid pixels to be included in the matching process. This specification is interpreted as membership degrees of appropriate metasets. The quality grades show how close is a particular sample to the ideal. They may be supplied by experts

The most significant innovation here is treating the membership and equality degrees of metasets as similarity measures for characters provided they are properly encoded as

The concept of binary tree plays the key role in the definition of metaset and related notions. Therefore, we start with establishing some well known terms and notation concerning it. We use the symbol for the infinite binary tree with the root . The nodes of the tree are finite binary sequences, the root is the empty sequence. For *p* ∈ the symbol |*p*| denotes the length of the sequence and #*p* denotes the natural number represented by the binary sequence *p*. Note, that | | = 0 and we assume # = 0. The ordering of nodes in is determined by reverse ordering of their lengths: *p* ≤ *q* whenever |*p*|≥|*q*|. In particular the root is the largest element in . The set of nodes of equal length *n* is called the *n*-th *level* in the tree: *<sup>n</sup>* = { *p* ∈ : |*p*| = *n* }. The level 0 contains only the root. Nodes of the tree are sometimes called *conditions*. If *p* ≤ *q* ∈ , then we say that the condition *p* is *stronger* than the condition *q*, and *q* is *weaker* than *p*. Thus, the conditions 0 and 1 are stronger than the root

and they are weaker than the conditions 00, 01, 10, 11, which form the level 2.

✏✏✏✏✏✶

[01] ❅ ❅■

> [011] ❆ ❆❑

Fig. 1. The binary tree and the ordering of nodes (conditions). Arrows point at the larger

A set of nodes *C* ⊂ is called a *chain* in , whenever all its elements are pairwise comparable: ∀*p*,*q*∈*<sup>C</sup>* (*p* ≤ *q* ∨ *q* ≤ *p*). A set *A* ⊂ is called *antichain* in , if it consists of mutually incomparable elements: ∀*p*,*q*∈*<sup>A</sup>* (*<sup>p</sup>* �= *<sup>q</sup>* → ¬ (*<sup>p</sup>* ≤ *<sup>q</sup>*) ∧ ¬ (*<sup>p</sup>* ≥ *<sup>q</sup>*)). On the Fig. 1, the elements { 00, 01, 100 } form a sample antichain. A *maximal antichain* is an antichain which cannot be extended by adding new elements – it is a maximal element with respect to inclusion of antichains. Examples of maximal antichains on the Fig. 1 are { 0, 1 } or { 00, 01, 1 } or even { }. They are in fact maximal finite antichains (MFA). A *branch* is a maximal chain in the tree . Note that *p* is comparable to *q* only, if there exists a branch containing *p* and *q* simultaneously. Similarly, *p* is incomparable to *q*, when no branch contains both *p* and *q*. To finish this section we prove a property of maximal finite antichains necessary for evaluating as numbers the degrees represented as sets of nodes. Clearly, there are 2*<sup>n</sup>* nodes on the *n*-th

[010] ✁ ✁✕

1

[1] 

> [11] ❅❅■

[110] ✁ ✁✕

<sup>2</sup>|*p*<sup>|</sup> = 1. This property may be generalized to arbitrary MFA.

[111] ❆ ❆❑

✐

[10] ✒

[100] ✁ ✁✕

[101] ❆ ❆❑

[0]

[00] ✒

> [001] ❆ ❆❑

[000] ✁ ✁✕

element, i.e., the weaker condition

level of the binary tree, so <sup>∑</sup>*p*<sup>∈</sup> *<sup>n</sup>*

together with the samples.

**1.2 Basic terms and notation**

metasets.

**Lemma 1.** *If A* ⊂ *is a maximal finite antichain in , then* <sup>∑</sup>*p*∈*<sup>A</sup>* 1 <sup>2</sup>|*p*<sup>|</sup> = <sup>1</sup>*.*

*Proof.* Each node *p* �= is a binary sequence which represents a natural number #*p*. Therefore, each *<sup>p</sup>* �<sup>=</sup> corresponds to an interval *<sup>p</sup>*¯ = [ #*<sup>p</sup>* <sup>2</sup>|*p*<sup>|</sup> ... #*p*+<sup>1</sup> <sup>2</sup>|*p*<sup>|</sup> ) ⊂ [0...1] and corresponds to *I* = [0...1). The length of each interval is <sup>1</sup> <sup>2</sup>|*p*<sup>|</sup> . For incomparable *<sup>p</sup>* and *<sup>q</sup>*, the corresponding intervals are disjoint: *p*¯ ∩ *q*¯ = ∅. Indeed, if *p*¯ ∩ *q*¯ �= ∅, then there must exist some *r* ∈ such, that *r*¯ ⊂ *p*¯ ∩ *q*¯. Since *r*¯ ⊂ *p*¯, then *r* ≤ *p*, and similarly *r* ≤ *q*. This implies *p* ≤ *q* or *q* ≤ *p*, so they are comparable.

We now show, that the measure of *<sup>p</sup>*∈*<sup>A</sup> <sup>p</sup>*¯ is equal 1. Clearly, it cannot be grater than 1, so if it is less, then let *<sup>u</sup>* ⊂ *<sup>I</sup>* \ *<sup>p</sup>*∈*<sup>A</sup> <sup>p</sup>*¯ be an open interval. There must exist *<sup>s</sup>* ∈ such, that *<sup>s</sup>*¯ ⊂ *<sup>u</sup>*. If *<sup>s</sup>* is comparable to some *<sup>p</sup>* ∈ *<sup>A</sup>*, then *<sup>s</sup>*¯ ∩ *<sup>p</sup>*¯ �= <sup>∅</sup>, so *<sup>s</sup>*¯ ∩ *<sup>p</sup>*∈*<sup>A</sup> <sup>p</sup>*¯ is non-empty, what contradicts *<sup>s</sup>*¯ ⊂ *<sup>u</sup>*. Thus, assuming that the length of *<sup>p</sup>*∈*<sup>A</sup> <sup>p</sup>*¯ is less than 1 we found *<sup>s</sup>* incomparable to all elements of *A*, what contradicts its maximality.

To complete the proof note, that the length of each *p*¯ is <sup>1</sup> <sup>2</sup>|*p*<sup>|</sup> , the measure of *<sup>p</sup>*∈*<sup>A</sup> <sup>p</sup>*¯ is 1 and they are all pairwise disjoint.

#### **2. Metasets**

In the classical set theory a set either is an element of another set or it is not; there are no intermediate levels. This binary approach has many vital limitations which make it difficult to apply by representation of vague, imprecise data. Therefore, for the last decades there were several attempts to inventing a concept of set with partial membership relation. Among the most successful ones are fuzzy sets (Zadeh, 1965), intuitionistic fuzzy sets (Atanassov, 1986) and rough sets (Pawlak, 1982). The metaset idea is a new approach to the problem.

One of the most significant characteristics of the metaset concept is its computer oriented design. Definitions of fundamental notions – like membership, equality or algebraic operations – may be formulated in the way which makes them easily implementable using programming languages (Starosta & Kosi ´nski, 2009). This facilitates fast and efficient computer representation and processing of vague data. Additionally, several important theoretical results may be obtained for the metasets which are representable in computers, because of their finite structure. Some of them – like the Lemma 3 – constitute the base for the discussed here mechanism.

#### **2.1 Fundamental concepts**

The concept of metaset is strictly based on the classical Zermelo-Fraenkel set theory (ZFC). We define metaset as a set of ordered pairs. The first element of a pair is a member of the metaset, which is another metaset. The second element of the pair is a node of the binary tree which – informally speaking – specifies the membership degree of the first element in the metaset.

**Definition 1.** A metaset is a crisp set which is either the empty set ∅ or which has the form:

$$\{\tau = \{\langle \sigma, p \rangle : \sigma \text{ is a metaset, } p \in \mathbb{T}\}\}\ .$$

The definition is recursive, however it is founded by the empty set ∅, by the Axiom of Foundation in ZFC (Kunen, 1980). First elements of ordered pairs contained in the metaset are called its *potential elements*.

**Definition 2.** Let *τ* be a metaset and let C be a branch in the binary tree . The interpretation

Character Recognition with Metasets 19

*<sup>τ</sup>*<sup>C</sup> = { *<sup>σ</sup>*<sup>C</sup> : �*σ*, *<sup>p</sup>*� ∈ *<sup>τ</sup>* ∧ *<sup>p</sup>* ∈ C } . Thus, branches in allow for producing crisp sets out of the metaset. The family of crisp sets { *<sup>τ</sup>*<sup>C</sup> : C is a branch in } consists of interpretations of the metaset *<sup>τ</sup>*. Properties of these

Any interpretation of the empty metaset is the empty set itself, independently of the branch: <sup>∅</sup><sup>C</sup> = <sup>∅</sup>, for each C ⊂ . The process of producing the interpretation of a metaset consists in two stages. In the first stage we remove all the ordered pairs whose second elements are conditions which do not belong to the branch C. The second stage replaces the remaining pairs – whose second elements lie on the branch C – with interpretations of their first elements, which are other metasets. This two-stage process is repeated recursively on all the levels of

*<sup>p</sup>* ∈C→ *<sup>τ</sup>*<sup>C</sup> = { <sup>∅</sup><sup>C</sup> } = { <sup>∅</sup> } ,

An interpretation of *A*-sample metaset is either the empty set ∅ or the singleton { ∅ }. An interpretation of *A*-pattern metaset *η* = { �*σ*, *p*� }, where *σ* is *A*-sample metaset, is given by

Therefore, an interpretation of any *A*-pattern metaset is one of: ∅, { ∅ }, { { ∅ } } or { ∅, { ∅ } }. For instance, if *ν* = { �∅, 0� }, *μ* = { �∅, 111� }, *τ* = { �*ν*, 1�,�*μ*, 11� } and C = { , 1, 11, 111, . . . } is the rightmost branch, then *<sup>ν</sup>*<sup>C</sup> = <sup>∅</sup>, *<sup>μ</sup>*<sup>C</sup> = { <sup>∅</sup> }, so *<sup>τ</sup>*<sup>C</sup> = { <sup>∅</sup>, { <sup>∅</sup> } }. We introduce now basic set-theoretic relations for metasets. All the relations are defined using

**Definition 3.** Let *τ*, *σ* be metasets and let *p* ∈ . We say that *σ* belongs to *τ* under the condition *<sup>p</sup>*, if for each branch C containing *<sup>p</sup>* holds *<sup>σ</sup>*<sup>C</sup> ∈ *<sup>τ</sup>*<sup>C</sup> . We use the notation *σ �<sup>p</sup> <sup>τ</sup>*. Note, that in fact we define an infinite number of membership relations here – each designated with different condition. The membership under the root condition *σ � τ* corresponds to the crisp, classical membership. The designates the highest membership degree, since it is the

We also define an independent set of non-membership relations. The reason for this lies in the fact, that ¬ *σ �<sup>p</sup> <sup>τ</sup>* does not imply that for each branch C containing *<sup>p</sup>* holds *<sup>σ</sup>*<sup>C</sup> �∈ *<sup>τ</sup>*<sup>C</sup> . It merely means that not for each such branch holds *σ*<sup>C</sup> ∈ *τ*C, however, there may still exist branches for

**Definition 4.** Let *τ*, *σ* be metasets and let *p* ∈ . We say that *σ* is not a member of *τ* under the condition *p*, if for each branch C containing *p* holds *σ*<sup>C</sup> �∈ *τ*<sup>C</sup> . We use the notation *σ �*/*<sup>p</sup> τ*.

{ ∅ } *p* ∈C∧ ran(*σ*) ∩ C = ∅ , { { ∅ } } *p* ∈C∧ ran(*σ*) ∩ C �= ∅ . (7)

*<sup>p</sup>* �∈C→ *<sup>τ</sup>*<sup>C</sup> = <sup>∅</sup> .

∅ *p* �∈ C ,

the same scheme – by referring to interpretations. We start with the membership.

largest element in . Stronger conditions designate lower degrees of membership.

Depending on the branch the metaset *τ* acquires different interpretations.

of the metaset *τ*, given by the branch C, is the following crisp set:

the membership hierarchy. As the result we obtain a crisp set.

*<sup>η</sup>*<sup>C</sup> =

which it is true.

⎧ ⎪⎨

⎪⎩

**Example 2.** Let *p* ∈ and let *τ* = { �∅, *p*� }. If C is a branch, then

interpretations determine properties of the metaset.

From the classical set theory point of view, a meta set is a relation between a crisp set of other meta sets and a set of nodes of the tree . Therefore, we adopt some terminology associated with relations. For the given metaset *τ* the set of its potential elements:

$$\text{dom}(\tau) = \{\sigma \colon \langle \sigma, p \rangle \in \tau\} \tag{1}$$

is called the *domain* of the metaset *τ*. Its *range* is the following set:

$$\text{ran}(\tau) = \{ \newline p \colon \langle \sigma, p \rangle \in \tau \}\ \newline \text{ .} \tag{2}$$

The reader may confirm that *τ* ⊂ dom(*τ*) × ran(*τ*) ⊂ dom(*τ*) × . For metasets *τ* and *σ* the set

$$\tau[\sigma] = \{ p \in \mathbb{T} \colon \langle \sigma, p \rangle \in \tau \}\tag{3}$$

is called the *image* of the metaset *τ* at the metaset *σ*. The image *τ*[*σ*] is the empty set ∅, whenever *σ* is not a potential element of *τ*.

**Example 1.** The simplest metaset is the empty set ∅. It may be a potential element of other metasets:

$$\begin{aligned} \tau &= \{ \langle \mathcal{Q}, p \rangle \} \ , & \qquad \tau[\mathcal{Q}] &= \{ p \} \ , & \qquad \text{dom}(\tau) &= \{ \mathcal{Q} \} \ , & \qquad \text{ran}(\tau) &= \{ p \} \ , \\ \sigma &= \{ \langle \mathcal{Q}, p \rangle, \langle \mathcal{Q}, q \rangle \} \ , & \qquad \sigma[\mathcal{Q}] &= \{ p, q \} \ , & \qquad \text{dom}(\sigma) &= \{ \mathcal{Q} \} \ , & \qquad \text{ran}(\sigma) &= \{ p, q \} \ , \\ \eta &= \{ \langle \tau, p \rangle, \langle \sigma, q \rangle \} \ , & \qquad \eta[\mathcal{Q}] &= \mathcal{Q} \ , & \qquad \text{dom}(\eta) &= \{ \tau, \sigma \} \ , & \qquad \text{ran}(\eta) &= \{ p, q \} \ . \end{aligned}$$

Clearly, *η*[*τ*] = *p*, *η*[*σ*] = *q* and since ∅ �∈ dom(*η*), then *η*[∅] = ∅.

In this paper we do not deal with metasets in general. We focus here on very specific classes relevant to character recognition problem. Narrowing the domain of discourse simplifies formulations of some results too. We introduce now two classes of metasets used for representation of characters and patterns.

Let *A* be a maximal finite antichain in . A non-empty metaset of form

$$
\chi \subset \{\mathcal{D}\} \times A \tag{4}
$$

is called *A*-*sample* metaset. Each non-empty subset *S* ⊂ *A* determines *A*-sample metaset { ∅ } × *S*. *A*-sample metasets are used for representing character samples. Let *P* be a finite set of *A*-sample metasets. A non-empty metaset of form

$$
\pi \subset P \times A \tag{5}
$$

is called *A*-*pattern* metaset. In other words, *A*-pattern metaset has the form

$$\pi = \bigcup\_{i=1}^{i=n} \{\chi\_i\} \times P\_i \tag{6}$$

where *χ<sup>i</sup>* are *A*-sample metasets and *Pi* ⊂ *A*, are not empty for*i* = 1, . . . , *n*. *A*-pattern metasets are used for representing character patterns.

We now explain the fundamental technique of interpretation used for defining relations on metasets. Also, it allows to perceive a metaset as a "fuzzy" family of crisp sets. Each member of such family represents some specific, particular point of view on the metaset.

4 Will-be-set-by-IN-TECH

From the classical set theory point of view, a meta set is a relation between a crisp set of other meta sets and a set of nodes of the tree . Therefore, we adopt some terminology associated

The reader may confirm that *τ* ⊂ dom(*τ*) × ran(*τ*) ⊂ dom(*τ*) × . For metasets *τ* and *σ* the

is called the *image* of the metaset *τ* at the metaset *σ*. The image *τ*[*σ*] is the empty set ∅,

**Example 1.** The simplest metaset is the empty set ∅. It may be a potential element of other

*τ* = { �∅, *p*� } , *τ*[∅] = { *p* } , dom(*τ*) = { ∅ } , ran(*τ*) = { *p* } , *σ* = { �∅, *p*�,�∅, *q*� } , *σ*[∅] = { *p*, *q* } , dom(*σ*) = { ∅ } , ran(*σ*) = { *p*, *q* } . *η* = { �*τ*, *p*�,�*σ*, *q*� } , *η*[∅] = ∅ , dom(*η*) = { *τ*, *σ* } , ran(*η*) = { *p*, *q* } .

In this paper we do not deal with metasets in general. We focus here on very specific classes relevant to character recognition problem. Narrowing the domain of discourse simplifies formulations of some results too. We introduce now two classes of metasets used for

is called *A*-*sample* metaset. Each non-empty subset *S* ⊂ *A* determines *A*-sample metaset

where *χ<sup>i</sup>* are *A*-sample metasets and *Pi* ⊂ *A*, are not empty for*i* = 1, . . . , *n*. *A*-pattern metasets

We now explain the fundamental technique of interpretation used for defining relations on metasets. Also, it allows to perceive a metaset as a "fuzzy" family of crisp sets. Each member

dom(*τ*) = { *σ* : �*σ*, *p*� ∈ *τ* } (1)

ran(*τ*) = { *p* : �*σ*, *p*� ∈ *τ* } . (2)

*τ*[*σ*] = { *p* ∈ : �*σ*, *p*� ∈ *τ* } (3)

*χ* ⊂ { ∅ } × *A* (4)

*π* ⊂ *P* × *A* (5)

{ *χ<sup>i</sup>* } × *Pi* (6)

with relations. For the given metaset *τ* the set of its potential elements:

is called the *domain* of the metaset *τ*. Its *range* is the following set:

Clearly, *η*[*τ*] = *p*, *η*[*σ*] = *q* and since ∅ �∈ dom(*η*), then *η*[∅] = ∅.

Let *A* be a maximal finite antichain in . A non-empty metaset of form

{ ∅ } × *S*. *A*-sample metasets are used for representing character samples. Let *P* be a finite set of *A*-sample metasets. A non-empty metaset of form

is called *A*-*pattern* metaset. In other words, *A*-pattern metaset has the form

*π* = *i* =*n i*=1

of such family represents some specific, particular point of view on the metaset.

whenever *σ* is not a potential element of *τ*.

representation of characters and patterns.

are used for representing character patterns.

set

metasets:

**Definition 2.** Let *τ* be a metaset and let C be a branch in the binary tree . The interpretation of the metaset *τ*, given by the branch C, is the following crisp set:

$$\mathsf{Tr} = \{ \sigma \mathbf{c} \colon \langle \sigma, p \rangle \in \mathfrak{r} \land p \in \mathcal{C} \}\ \ .$$

Thus, branches in allow for producing crisp sets out of the metaset. The family of crisp sets { *<sup>τ</sup>*<sup>C</sup> : C is a branch in } consists of interpretations of the metaset *<sup>τ</sup>*. Properties of these interpretations determine properties of the metaset.

Any interpretation of the empty metaset is the empty set itself, independently of the branch: <sup>∅</sup><sup>C</sup> = <sup>∅</sup>, for each C ⊂ . The process of producing the interpretation of a metaset consists in two stages. In the first stage we remove all the ordered pairs whose second elements are conditions which do not belong to the branch C. The second stage replaces the remaining pairs – whose second elements lie on the branch C – with interpretations of their first elements, which are other metasets. This two-stage process is repeated recursively on all the levels of the membership hierarchy. As the result we obtain a crisp set.

**Example 2.** Let *p* ∈ and let *τ* = { �∅, *p*� }. If C is a branch, then

$$\begin{aligned} p \in \mathcal{C} &\to \tau\_{\mathcal{C}} = \{\oslash\_{\mathcal{C}}\} = \{\oslash\} \land \\ p \notin \mathcal{C} &\to \tau\_{\mathcal{C}} = \mathcal{D} \; . \end{aligned}$$

Depending on the branch the metaset *τ* acquires different interpretations.

An interpretation of *A*-sample metaset is either the empty set ∅ or the singleton { ∅ }. An interpretation of *A*-pattern metaset *η* = { �*σ*, *p*� }, where *σ* is *A*-sample metaset, is given by

$$\eta c = \begin{cases} \mathcal{D} & p \notin \mathcal{C} \\ \{\mathcal{D}\} & p \in \mathcal{C} \land \text{ran}(\sigma) \cap \mathcal{C} = \mathcal{D} \\ \{\{\mathcal{Q}\}\} & p \in \mathcal{C} \land \text{ran}(\sigma) \cap \mathcal{C} \neq \mathcal{D} \end{cases} \tag{7}$$

Therefore, an interpretation of any *A*-pattern metaset is one of: ∅, { ∅ }, { { ∅ } } or { ∅, { ∅ } }. For instance, if *ν* = { �∅, 0� }, *μ* = { �∅, 111� }, *τ* = { �*ν*, 1�,�*μ*, 11� } and C = { , 1, 11, 111, . . . } is the rightmost branch, then *<sup>ν</sup>*<sup>C</sup> = <sup>∅</sup>, *<sup>μ</sup>*<sup>C</sup> = { <sup>∅</sup> }, so *<sup>τ</sup>*<sup>C</sup> = { <sup>∅</sup>, { <sup>∅</sup> } }. We introduce now basic set-theoretic relations for metasets. All the relations are defined using the same scheme – by referring to interpretations. We start with the membership.

**Definition 3.** Let *τ*, *σ* be metasets and let *p* ∈ . We say that *σ* belongs to *τ* under the condition *<sup>p</sup>*, if for each branch C containing *<sup>p</sup>* holds *<sup>σ</sup>*<sup>C</sup> ∈ *<sup>τ</sup>*<sup>C</sup> . We use the notation *σ �<sup>p</sup> <sup>τ</sup>*.

Note, that in fact we define an infinite number of membership relations here – each designated with different condition. The membership under the root condition *σ � τ* corresponds to the crisp, classical membership. The designates the highest membership degree, since it is the largest element in . Stronger conditions designate lower degrees of membership.

We also define an independent set of non-membership relations. The reason for this lies in the fact, that ¬ *σ �<sup>p</sup> <sup>τ</sup>* does not imply that for each branch C containing *<sup>p</sup>* holds *<sup>σ</sup>*<sup>C</sup> �∈ *<sup>τ</sup>*<sup>C</sup> . It merely means that not for each such branch holds *σ*<sup>C</sup> ∈ *τ*C, however, there may still exist branches for which it is true.

**Definition 4.** Let *τ*, *σ* be metasets and let *p* ∈ . We say that *σ* is not a member of *τ* under the condition *p*, if for each branch C containing *p* holds *σ*<sup>C</sup> �∈ *τ*<sup>C</sup> . We use the notation *σ �*/*<sup>p</sup> τ*.

are called *membership*, *non-membership*, *equality* and *difference sets* for *σ* and *τ* (or *η*) respectively.

Character Recognition with Metasets 21

*p*∈M(*σ*,*τ*)

*p*∈N(*σ*,*τ*)

*p*∈E(*σ*,*η*)

*p*∈D(*σ*,*η*)

are called the *membership*, *non-membership*, *equality* and *difference values* of *σ* in *τ* (or *η*)

It is worth stressing, that *A*-sample metasets and *A*-pattern metasets have the following

**Lemma 3.** *Let A be a maximal finite antichain. Let σ, η be arbitrary A-sample metasets and let τ be*

*Proof.* First, observe that M(*σ*, *τ*) ∩ N(*σ*, *τ*) = ∅ and E(*σ*, *η*) ∩ D(*σ*, *η*) = ∅. Indeed, it is not possible that for some *p* ∈ *A* simultaneously hold *σ �<sup>p</sup> τ* and *σ �*/*<sup>p</sup> τ* or *σ* ≈*<sup>p</sup> η* and *σ* ≈/ *<sup>p</sup> η*.

To prove (18) it is enough to show, that for each *p* ∈ *A* either *σ �<sup>p</sup> τ* or *σ �*/*<sup>p</sup> τ* is true. In other words, either for all branches C containing *p* holds *σ*<sup>C</sup> ∈ *τ*<sup>C</sup> or for all such branches holds *σ*<sup>C</sup> �∈ *τ*C. Clearly, for any branch C either *σ*<sup>C</sup> is a member of *τ*<sup>C</sup> or not, the question is whether the (non-)membership is maintained for all interpretations determined by a *p* ∈ *A*. This is true for *A*-sample metaset *σ* and *A*-pattern metaset *τ*, since ran(*σ*) ⊂ *A* and ran(*τ*) ⊂ *A* and

The lemma says that there is no hesitancy in membership or equality for such metasets. This is not true for metasets in general. There exist metasets *α*, *β* with infinite ranges such, that for any *p* ∈ neither *α �<sup>p</sup> β* nor *α �*/*<sup>p</sup> β* is true, see (Starosta, 2010) for details. When we translate this property into the language of character recognition, then it says that for each pixel of a character we may decide whether it matches some pattern (or another character) or not. There

*<sup>η</sup>*∈dom(*τ*) ran(*η*) <sup>⊂</sup> *<sup>A</sup>*. Therefore, there exist no conditions stronger than *<sup>p</sup>* which could affect the interpretations. In other words, if C� and C�� are different branches containing *p* ∈ *A*,

1

1

1

1

m(*σ*, *τ*) + n(*σ*, *τ*) = 1 , (16) e(*σ*, *η*) + d(*σ*, *η*) = 1 . (17)

M(*σ*, *τ*) ∪ N(*σ*, *τ*) = *A* , (18) E(*σ*, *η*) ∪ D(*σ*, *η*) = *A* . (19)

<sup>2</sup>|*p*<sup>|</sup> , (12)

<sup>2</sup>|*p*<sup>|</sup> , (13)

<sup>2</sup>|*p*<sup>|</sup> , (14)

<sup>2</sup>|*p*<sup>|</sup> , (15)

m(*σ*, *τ*) = ∑

n(*σ*, *τ*) = ∑

e(*σ*, *η*) = ∑

d(*σ*, *η*) = ∑

respectively. Clearly, by the Lemma 1 all they range between 0 and 1, inclusive.

Therefore, by using the Lemma 1 we may reformulate the thesis as follows:

*arbitrary A-pattern metaset. The following equations hold:*

then *<sup>τ</sup>*C� = *<sup>τ</sup>*C�� and *<sup>σ</sup>*C� = *<sup>σ</sup>*C�� . The proof of (19) is analogous.

The values

important property.

also

is not any doubt about it.

It might occur strange to the reader that two metasets may be in membership and non-membership relations simultaneously. The relations must be qualified by incomparable conditions, though.

**Example 3.** Let *<sup>τ</sup>* <sup>=</sup> { �∅, 0� }. We check that <sup>∅</sup> *�*<sup>0</sup> *<sup>τ</sup>* <sup>∧</sup> <sup>∅</sup> /*�*<sup>1</sup> *<sup>τ</sup>*. Indeed, if <sup>C</sup><sup>0</sup> is a branch containing 0, then <sup>∅</sup>C<sup>0</sup> <sup>=</sup> <sup>∅</sup> <sup>∈</sup> { <sup>∅</sup> } <sup>=</sup> *<sup>τ</sup>*C<sup>0</sup> . Similarly, if <sup>C</sup><sup>1</sup> is a branch containing 1, then <sup>∅</sup>C<sup>1</sup> <sup>=</sup> <sup>∅</sup> �∈ <sup>∅</sup> <sup>=</sup> *<sup>τ</sup>*C<sup>1</sup> . Also, <sup>¬</sup> <sup>∅</sup> *� <sup>τ</sup>* ∧ ¬ <sup>∅</sup> /*� <sup>τ</sup>*, since it is not true, that for each branch <sup>C</sup> containing holds <sup>∅</sup><sup>C</sup> ∈ *<sup>τ</sup>*<sup>C</sup> or <sup>∅</sup><sup>C</sup> �∈ *<sup>τ</sup>*C.

As we see, ¬ *σ �<sup>p</sup> τ* does not completely exclude the membership of *σ* in *τ*, even for *p* = . The fact that ¬ *σ � τ* does not contradict *σ �<sup>p</sup> τ* for some *p* ∈ . It merely says that *σ* cannot belong to *τ* under the condition . For incomparable conditions *p*, *q* it is possible that *σ �*/*<sup>p</sup> τ* and at the same time *σ �<sup>q</sup> τ*. But it is not true that *σ �*/*<sup>p</sup> τ* ∧ *σ �<sup>p</sup> τ* for any *p*.

Analogously – by referring to interpretations – we define two sets of equality relations.

**Definition 5.** Let *p* ∈ and let *τ*, *σ* be metasets. We say that *σ* is equal to *τ* under the condition *<sup>p</sup>*, if for each branch C containing *<sup>p</sup>* holds *<sup>σ</sup>*<sup>C</sup> = *<sup>τ</sup>*C. We use the notation *<sup>σ</sup>* ≈*<sup>p</sup> <sup>τ</sup>*.

**Definition 6.** Let *p* ∈ and let *τ*, *σ* be metasets. We say that *σ* is different than *τ* under the condition *<sup>p</sup>*, if for each branch C containing *<sup>p</sup>* holds *<sup>σ</sup>*<sup>C</sup> �= *<sup>τ</sup>*C. We use the notation *<sup>σ</sup>* ≈/ *<sup>p</sup> <sup>τ</sup>*.

Similarly as for conditional membership, it is possible that *σ* ≈*<sup>p</sup> τ* ∧ *σ* ≈/*<sup>q</sup> τ* for some metasets *σ*, *τ* and *p*, *q* ∈ .

**Example 4.** Let *τ* = { �∅, � } and *η* = { �∅, 1� }. For a branch C containing 0 we have *<sup>τ</sup>*<sup>C</sup> = { <sup>∅</sup> } and *<sup>η</sup>*<sup>C</sup> = <sup>∅</sup>. On the other hand, if C contains 1, then we have *<sup>τ</sup>*<sup>C</sup> = { <sup>∅</sup> } = *<sup>η</sup>*<sup>C</sup> . Thus, *τ* ≈/0 *η* and *τ* ≈<sup>1</sup> *η*. However, ¬ *τ* ≈ *η* ∧ ¬ *τ* ≈/ *η*.

The following lemma is the metaset version of the obvious fact known from the crisp set theory: *x* = *y* ∧ *y* ∈ *z* → *x* ∈ *z*.

**Lemma 2.** *If p* ∈ *and τ, σ, λ are metasets such, that τ* ≈*<sup>p</sup> σ and σ �<sup>p</sup> λ, then also τ �<sup>p</sup> λ.*

*Proof.* If C is an arbitrary branch containing *<sup>p</sup>*, then by the assumptions *<sup>τ</sup>*<sup>C</sup> = *<sup>σ</sup>*<sup>C</sup> and *<sup>σ</sup>*<sup>C</sup> ∈ *<sup>λ</sup>*<sup>C</sup> . Therefore, also *<sup>τ</sup>*<sup>C</sup> ∈ *<sup>λ</sup>*<sup>C</sup> , what implies *τ �<sup>p</sup> <sup>λ</sup>*.

The certainty grades for relations on metasets are represented by sets of nodes of the binary tree and they may be evaluated as real numbers. We do not develop the general theory here, the interested reader is referred to (Starosta, 2010). Instead, we show how to evaluate the degrees of membership, non-membership, equality and difference for *A*-sample metasets and *A*-pattern metasets, when the maximal finite antichain *A* is fixed. Let *σ*, *η* be *A*-sample metasets and let *τ* be *A*-pattern metaset. The following sets contained in *A*

$$\mathbf{M}(\sigma,\tau) = \left\{ \begin{array}{c} p \in A \text{:} \,\,\sigma \,\,\epsilon\_p \,\,\tau \right\} \,\,\, \right. \tag{8}$$

$$\mathcal{N}(\sigma, \tau) = \left\{ \begin{aligned} p \in A \colon \sigma \notin \rho \text{ } \tau \right\} \end{aligned} \tag{9}$$

$$\mathcal{E}(\sigma, \eta) = \left\{ \begin{aligned} p \in A \colon \sigma \approx\_p \eta \end{aligned} \right\} \,, \tag{10}$$

$$\mathcal{D}(\sigma,\eta) = \left\{ \begin{aligned} p \in A \colon \sigma \not\simeq\_p \eta \end{aligned} \right\} \,, \tag{11}$$

6 Will-be-set-by-IN-TECH

It might occur strange to the reader that two metasets may be in membership and non-membership relations simultaneously. The relations must be qualified by incomparable

**Example 3.** Let *<sup>τ</sup>* <sup>=</sup> { �∅, 0� }. We check that <sup>∅</sup> *�*<sup>0</sup> *<sup>τ</sup>* <sup>∧</sup> <sup>∅</sup> /*�*<sup>1</sup> *<sup>τ</sup>*. Indeed, if <sup>C</sup><sup>0</sup> is a branch containing 0, then <sup>∅</sup>C<sup>0</sup> <sup>=</sup> <sup>∅</sup> <sup>∈</sup> { <sup>∅</sup> } <sup>=</sup> *<sup>τ</sup>*C<sup>0</sup> . Similarly, if <sup>C</sup><sup>1</sup> is a branch containing 1, then <sup>∅</sup>C<sup>1</sup> <sup>=</sup> <sup>∅</sup> �∈ <sup>∅</sup> <sup>=</sup> *<sup>τ</sup>*C<sup>1</sup> . Also, <sup>¬</sup> <sup>∅</sup> *� <sup>τ</sup>* ∧ ¬ <sup>∅</sup> /*� <sup>τ</sup>*, since it is not true, that for each branch <sup>C</sup>

As we see, ¬ *σ �<sup>p</sup> τ* does not completely exclude the membership of *σ* in *τ*, even for *p* = . The fact that ¬ *σ � τ* does not contradict *σ �<sup>p</sup> τ* for some *p* ∈ . It merely says that *σ* cannot belong to *τ* under the condition . For incomparable conditions *p*, *q* it is possible that *σ �*/*<sup>p</sup> τ*

**Definition 5.** Let *p* ∈ and let *τ*, *σ* be metasets. We say that *σ* is equal to *τ* under the condition *<sup>p</sup>*, if for each branch C containing *<sup>p</sup>* holds *<sup>σ</sup>*<sup>C</sup> = *<sup>τ</sup>*C. We use the notation *<sup>σ</sup>* ≈*<sup>p</sup> <sup>τ</sup>*.

**Definition 6.** Let *p* ∈ and let *τ*, *σ* be metasets. We say that *σ* is different than *τ* under the condition *<sup>p</sup>*, if for each branch C containing *<sup>p</sup>* holds *<sup>σ</sup>*<sup>C</sup> �= *<sup>τ</sup>*C. We use the notation *<sup>σ</sup>* ≈/ *<sup>p</sup> <sup>τ</sup>*.

Similarly as for conditional membership, it is possible that *σ* ≈*<sup>p</sup> τ* ∧ *σ* ≈/*<sup>q</sup> τ* for some metasets

**Example 4.** Let *τ* = { �∅, � } and *η* = { �∅, 1� }. For a branch C containing 0 we have *<sup>τ</sup>*<sup>C</sup> = { <sup>∅</sup> } and *<sup>η</sup>*<sup>C</sup> = <sup>∅</sup>. On the other hand, if C contains 1, then we have *<sup>τ</sup>*<sup>C</sup> = { <sup>∅</sup> } = *<sup>η</sup>*<sup>C</sup> .

The following lemma is the metaset version of the obvious fact known from the crisp set

*Proof.* If C is an arbitrary branch containing *<sup>p</sup>*, then by the assumptions *<sup>τ</sup>*<sup>C</sup> = *<sup>σ</sup>*<sup>C</sup> and *<sup>σ</sup>*<sup>C</sup> ∈ *<sup>λ</sup>*<sup>C</sup> .

The certainty grades for relations on metasets are represented by sets of nodes of the binary tree and they may be evaluated as real numbers. We do not develop the general theory here, the interested reader is referred to (Starosta, 2010). Instead, we show how to evaluate the degrees of membership, non-membership, equality and difference for *A*-sample metasets and *A*-pattern metasets, when the maximal finite antichain *A* is fixed. Let *σ*, *η* be *A*-sample

<sup>M</sup>(*σ*, *<sup>τ</sup>*) = *<sup>p</sup>* <sup>∈</sup> *<sup>A</sup>*: *σ �<sup>p</sup> <sup>τ</sup>*

<sup>N</sup>(*σ*, *<sup>τ</sup>*) = *<sup>p</sup>* <sup>∈</sup> *<sup>A</sup>*: *σ �*/*<sup>p</sup> <sup>τ</sup>*

<sup>E</sup>(*σ*, *<sup>η</sup>*) = *<sup>p</sup>* <sup>∈</sup> *<sup>A</sup>*: *<sup>σ</sup>* <sup>≈</sup>*<sup>p</sup> <sup>η</sup>*

<sup>D</sup>(*σ*, *<sup>η</sup>*) = *<sup>p</sup>* <sup>∈</sup> *<sup>A</sup>*: *<sup>σ</sup>* <sup>≈</sup>/ *<sup>p</sup> <sup>η</sup>*

, (8)

, (9)

, (10)

, (11)

**Lemma 2.** *If p* ∈ *and τ, σ, λ are metasets such, that τ* ≈*<sup>p</sup> σ and σ �<sup>p</sup> λ, then also τ �<sup>p</sup> λ.*

metasets and let *τ* be *A*-pattern metaset. The following sets contained in *A*

Analogously – by referring to interpretations – we define two sets of equality relations.

and at the same time *σ �<sup>q</sup> τ*. But it is not true that *σ �*/*<sup>p</sup> τ* ∧ *σ �<sup>p</sup> τ* for any *p*.

Thus, *τ* ≈/0 *η* and *τ* ≈<sup>1</sup> *η*. However, ¬ *τ* ≈ *η* ∧ ¬ *τ* ≈/ *η*.

conditions, though.

*σ*, *τ* and *p*, *q* ∈ .

theory: *x* = *y* ∧ *y* ∈ *z* → *x* ∈ *z*.

Therefore, also *<sup>τ</sup>*<sup>C</sup> ∈ *<sup>λ</sup>*<sup>C</sup> , what implies *τ �<sup>p</sup> <sup>λ</sup>*.

containing holds <sup>∅</sup><sup>C</sup> ∈ *<sup>τ</sup>*<sup>C</sup> or <sup>∅</sup><sup>C</sup> �∈ *<sup>τ</sup>*C.

are called *membership*, *non-membership*, *equality* and *difference sets* for *σ* and *τ* (or *η*) respectively. The values

$$\mathfrak{m}(\sigma,\tau) = \sum\_{p \in \mathcal{M}(\sigma,\tau)} \frac{1}{2^{|p|}} \quad , \tag{12}$$

$$\mathfrak{m}(\sigma,\tau) = \sum\_{p \in \mathbb{N}(\sigma,\tau)} \frac{1}{2^{|p|}} \quad , \tag{13}$$

$$\mathbf{e}(\sigma,\eta) = \sum\_{p \in \mathcal{E}(\sigma,\eta)} \frac{1}{\mathfrak{L}^{|p|}} \quad . \tag{14}$$

$$\mathsf{cl}(\sigma,\eta) = \sum\_{p \in \mathsf{D}(\sigma,\eta)} \frac{1}{\mathsf{2}^{|p|}} \; \mathsf{c} \tag{15}$$

are called the *membership*, *non-membership*, *equality* and *difference values* of *σ* in *τ* (or *η*) respectively. Clearly, by the Lemma 1 all they range between 0 and 1, inclusive.

It is worth stressing, that *A*-sample metasets and *A*-pattern metasets have the following important property.

**Lemma 3.** *Let A be a maximal finite antichain. Let σ, η be arbitrary A-sample metasets and let τ be arbitrary A-pattern metaset. The following equations hold:*

$$\mathbf{m}(\sigma,\tau) + \mathbf{n}(\sigma,\tau) = 1 \quad , \tag{16}$$

$$\mathbf{e}(\sigma,\eta) + \mathbf{d}(\sigma,\eta) = 1 \; . \tag{17}$$

*Proof.* First, observe that M(*σ*, *τ*) ∩ N(*σ*, *τ*) = ∅ and E(*σ*, *η*) ∩ D(*σ*, *η*) = ∅. Indeed, it is not possible that for some *p* ∈ *A* simultaneously hold *σ �<sup>p</sup> τ* and *σ �*/*<sup>p</sup> τ* or *σ* ≈*<sup>p</sup> η* and *σ* ≈/ *<sup>p</sup> η*. Therefore, by using the Lemma 1 we may reformulate the thesis as follows:

$$\mathbf{M}(\sigma,\tau) \cup \mathbf{N}(\sigma,\tau) = A \quad , \tag{18}$$

$$\mathcal{E}(\sigma,\eta) \cup \mathcal{D}(\sigma,\eta) = A \; . \tag{19}$$

To prove (18) it is enough to show, that for each *p* ∈ *A* either *σ �<sup>p</sup> τ* or *σ �*/*<sup>p</sup> τ* is true. In other words, either for all branches C containing *p* holds *σ*<sup>C</sup> ∈ *τ*<sup>C</sup> or for all such branches holds *σ*<sup>C</sup> �∈ *τ*C. Clearly, for any branch C either *σ*<sup>C</sup> is a member of *τ*<sup>C</sup> or not, the question is whether the (non-)membership is maintained for all interpretations determined by a *p* ∈ *A*. This is true for *A*-sample metaset *σ* and *A*-pattern metaset *τ*, since ran(*σ*) ⊂ *A* and ran(*τ*) ⊂ *A* and also *<sup>η</sup>*∈dom(*τ*) ran(*η*) <sup>⊂</sup> *<sup>A</sup>*. Therefore, there exist no conditions stronger than *<sup>p</sup>* which could affect the interpretations. In other words, if C� and C�� are different branches containing *p* ∈ *A*, then *<sup>τ</sup>*C� = *<sup>τ</sup>*C�� and *<sup>σ</sup>*C� = *<sup>σ</sup>*C�� . The proof of (19) is analogous.

The lemma says that there is no hesitancy in membership or equality for such metasets. This is not true for metasets in general. There exist metasets *α*, *β* with infinite ranges such, that for any *p* ∈ neither *α �<sup>p</sup> β* nor *α �*/*<sup>p</sup> β* is true, see (Starosta, 2010) for details. When we translate this property into the language of character recognition, then it says that for each pixel of a character we may decide whether it matches some pattern (or another character) or not. There is not any doubt about it.

it allows for evaluation of the similarity degree of a character testing sample (CTS) to the

Character Recognition with Metasets 23

In the following theorem the metaset *σ* represents the testing sample (CTS), *ρ* is the compound character pattern (CCP) built up of potential elements *π<sup>i</sup>* representing characters. The sets *P<sup>i</sup>* and *S* constitute the structures of the pattern samples and the input sample. The sets *Q<sup>i</sup>* represent equality degrees of the CTS and CCP elements ans the sets *R<sup>i</sup>* represent the qualities

> *i*=1 *πi*

*, the following holds:*

*<sup>q</sup>* <sup>∈</sup> *<sup>Q</sup><sup>i</sup>* <sup>→</sup> *<sup>σ</sup>* <sup>≈</sup>*<sup>q</sup> <sup>π</sup><sup>i</sup>* , (24) *<sup>q</sup>* <sup>∈</sup> *<sup>A</sup>* \ *<sup>Q</sup><sup>i</sup>* <sup>→</sup> *<sup>σ</sup>* <sup>≈</sup>/*<sup>q</sup> <sup>π</sup><sup>i</sup>* , (25) *u* ∈ *U* → *σ �<sup>u</sup> ρ* , (26) *u* ∈ *A* \ *U* → *σ �*/*<sup>u</sup> ρ* . (27)

> , so *πi* , *u*

> > = *πi* <sup>C</sup> : *<sup>i</sup>* <sup>∈</sup> *<sup>I</sup>*

<sup>C</sup> . Since *<sup>σ</sup>*<sup>C</sup> is different than all the members of *<sup>ρ</sup>*<sup>C</sup> , then

. We may split each *R<sup>i</sup>* into two parts:

*<sup>i</sup>*=<sup>1</sup> *<sup>Q</sup><sup>i</sup>* <sup>∩</sup> *<sup>R</sup><sup>i</sup>*

, *R<sup>i</sup>*

<sup>×</sup> *<sup>R</sup><sup>i</sup> be metasets. For the sets*

, *S* ⊂ *A be not*

. By (24) this

. Since *u* ∈ *C*

. (29)

, so by (25) holds

= *I* ∪ ∅ = *I* . (30)

∈ *ρ* – and by the

∪ *U* . (28)

**Theorem 5.** *Let A be a maximal finite antichain in and let i* = 1, . . . , *k. Let P<sup>i</sup>*

) *and U* = *<sup>k</sup>*

To prove (26) take *<sup>u</sup>* <sup>∈</sup> *<sup>U</sup>*. There exists *<sup>i</sup>* <sup>∈</sup> { 1... *<sup>k</sup>* } such, that *<sup>u</sup>* <sup>∈</sup> *<sup>R</sup><sup>i</sup>* <sup>∩</sup> *<sup>Q</sup><sup>i</sup>*

*<sup>i</sup>*=<sup>1</sup> *<sup>R</sup><sup>i</sup>* and let *<sup>Q</sup>*¯*<sup>i</sup>* <sup>=</sup> *<sup>A</sup>* \ *<sup>Q</sup><sup>i</sup>*

<sup>=</sup> *<sup>R</sup><sup>i</sup>* <sup>∩</sup> *<sup>Q</sup>*¯*<sup>i</sup>* <sup>∪</sup> *<sup>R</sup><sup>i</sup>* <sup>∩</sup> *<sup>Q</sup><sup>i</sup>*

*k*

*i*=1

If *<sup>u</sup>* <sup>∈</sup> *<sup>R</sup>* \ *<sup>U</sup>*, then let *<sup>I</sup>* <sup>⊂</sup> { 1... *<sup>k</sup>* } be the set of all those *<sup>i</sup>*, for which *<sup>u</sup>* <sup>∈</sup> *<sup>R</sup><sup>i</sup>* <sup>∩</sup> *<sup>Q</sup>*¯*<sup>i</sup>*

 = *πi*

*<sup>i</sup>* : *<sup>u</sup>* <sup>∈</sup> *<sup>R</sup><sup>i</sup>* <sup>∩</sup> *<sup>Q</sup>*¯*<sup>i</sup>*

Definition 3 we have *π<sup>i</sup> �<sup>u</sup> ρ*. Thus, by the Lemma 2 we obtain *σ �<sup>u</sup> ρ*.

*<sup>R</sup><sup>i</sup>* <sup>∩</sup> *<sup>Q</sup>*¯*<sup>i</sup>* <sup>∪</sup>

. By the construction of *<sup>ρ</sup>* – *<sup>u</sup>* <sup>∈</sup> *<sup>R</sup><sup>i</sup>*

*<sup>R</sup><sup>i</sup>* <sup>∩</sup> *<sup>Q</sup><sup>i</sup>* <sup>=</sup>

Let *u* ∈ *A* \ *U* and let C be a branch containing *u*. Note, that *U* ⊂ *R* ⊂ *A*, so we consider two

and for each *<sup>i</sup>* <sup>∈</sup> *<sup>I</sup>* the intersection *<sup>R</sup><sup>i</sup>* ∩ C contains at most one element (which is *<sup>u</sup>*), then by

 ∪ 

*σ*<sup>C</sup> �∈ *ρ*<sup>C</sup> for any branch C � *u*, what gives *σ �*/*<sup>u</sup> ρ*. This proves (27) for the case when *u* ∈ *R* \ *U*. If *<sup>u</sup>* ∈ *<sup>A</sup>* \ *<sup>R</sup>*, then for C � *<sup>u</sup>* we have *<sup>ρ</sup>*<sup>C</sup> = <sup>∅</sup>, so *σ �*/*<sup>u</sup> <sup>ρ</sup>* for any *<sup>σ</sup>*, what implies the second case

<sup>C</sup> <sup>∈</sup> *<sup>ρ</sup>*<sup>C</sup> . However, for *<sup>i</sup>* <sup>∈</sup> *<sup>I</sup>* we also have *<sup>u</sup>* �∈ *<sup>Q</sup><sup>i</sup>*

. Therefore,

*k*

 *<sup>R</sup><sup>i</sup>* <sup>∩</sup> *<sup>Q</sup>*¯*<sup>i</sup>* 

*i*=1

<sup>C</sup> : 1 <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>k</sup>* <sup>∧</sup> *<sup>u</sup>* <sup>∈</sup> *<sup>R</sup><sup>i</sup>*

*<sup>i</sup>* : *<sup>u</sup>* <sup>∈</sup> *<sup>R</sup><sup>i</sup>* <sup>∩</sup> *<sup>Q</sup><sup>i</sup>*

*empty. Let <sup>σ</sup>* <sup>=</sup> { <sup>∅</sup> } <sup>×</sup> *S, <sup>π</sup><sup>i</sup>* <sup>=</sup> { <sup>∅</sup> } <sup>×</sup> *<sup>P</sup><sup>i</sup> and <sup>ρ</sup>* <sup>=</sup> *<sup>k</sup>*

compound character pattern (CCP).

*<sup>Q</sup><sup>i</sup>* <sup>=</sup> *<sup>S</sup>* <sup>∩</sup> *<sup>P</sup><sup>i</sup>* <sup>∪</sup> (*<sup>A</sup>* \ *<sup>S</sup>*) <sup>∩</sup> (*<sup>A</sup>* \ *<sup>P</sup><sup>i</sup>*

*Proof.* The Lemma 4 proves (24) and (25).

, since *<sup>u</sup>* <sup>∈</sup> *<sup>Q</sup><sup>i</sup>*

*R* = *k*

*i*=1

<sup>C</sup> : 1 <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>k</sup>* <sup>∧</sup> *<sup>R</sup><sup>i</sup>* ∩ C �<sup>=</sup> <sup>∅</sup>

 = 

The last equality is implied by the following (since *u* �∈ *U*):

of CCP members.

implies *<sup>σ</sup>* <sup>≈</sup>*<sup>u</sup> <sup>π</sup><sup>i</sup>*

the Definition 2

*<sup>ρ</sup>*<sup>C</sup> = *πi*

for (27).

*R<sup>i</sup>* = *<sup>R</sup><sup>i</sup>* \ *<sup>Q</sup><sup>i</sup>* ∪ *<sup>R</sup><sup>i</sup>* <sup>∩</sup> *<sup>Q</sup><sup>i</sup>* 

To prove (27) let *R* = *<sup>k</sup>*

cases: *u* ∈ *R* \ *U* and *u* ∈ *A* \ *R*.

Thus, for *<sup>i</sup>* <sup>∈</sup> *<sup>I</sup>* we have *<sup>π</sup><sup>i</sup>*

*<sup>i</sup>* : *<sup>u</sup>* <sup>∈</sup> *<sup>R</sup><sup>i</sup>*

*<sup>σ</sup>* <sup>≈</sup>/*<sup>u</sup> <sup>π</sup><sup>i</sup>* and consequently *<sup>σ</sup>*<sup>C</sup> �<sup>=</sup> *<sup>π</sup><sup>i</sup>*

#### **2.2 Properties relevant to character recognition**

In this section we prove some technical facts strictly relevant to character recognition mechanism. We refer to them in the sequel. Proofs are not required for understanding the idea so they may be skipped on first reading. We supply them for mathematical completeness and clarity.

The following lemma tells that for two given *A*-sample metasets *τ* and *σ*, their conditional difference is determined by the elements of the symmetric difference of their ranges: ran(*τ*) ran(*σ*), whereas their conditional equality is determined by the complement to *A* of the symmetric difference: *A* \ (ran(*τ*) ran(*σ*)).

We may express this property in terms of character recognition as follows. When comparing two characters, then not only the pixels belonging to them simultaneously affect the result of comparison, but also the pixels that belong to background of both. If a pixel belongs to one of the characters and for another character the same pixel forms the background, then such pixel asserts the difference between the characters.

**Lemma 4.** *Let A be a finite maximal antichain in and let S*, *T* ⊂ *A be not empty. Let τ* = { ∅ } × *T and σ* = { ∅ } × *S. For R* = *S* ∩ *T* ∪ (*A* \ *S*) ∩ (*A* \ *T*) *the following implications hold:*

$$
\sigma \in \mathbb{R} \to \mathfrak{T} \approx\_{\mathfrak{r}} \sigma \ , \tag{20}
$$

$$
\sigma \in A \mid \mathbb{R} \to \mathbb{T} \not\cong\_r \sigma \; . \tag{21}
$$

*Proof.* Assume that *<sup>r</sup>* ∈ *<sup>S</sup>* ∩ *<sup>T</sup>*. If C is a branch containing *<sup>r</sup>*, then clearly *<sup>τ</sup>*<sup>C</sup> = { <sup>∅</sup> } = *<sup>σ</sup>*<sup>C</sup> , and therefore *<sup>τ</sup>* ≈*<sup>r</sup> <sup>σ</sup>*. If *<sup>r</sup>* ∈ (*<sup>A</sup>* \ *<sup>S</sup>*) ∩ (*<sup>A</sup>* \ *<sup>T</sup>*) and C is a branch containing *<sup>r</sup>*, then *<sup>τ</sup>*<sup>C</sup> = <sup>∅</sup> = *<sup>σ</sup>*<sup>C</sup> , so *τ* ≈*<sup>r</sup> σ* holds too. This proves (20).

To prove (21) note, that:

$$\begin{aligned} A \backslash R &= A \backslash \left( T \cap S \cup \left( A \backslash T \right) \cap \left( A \backslash S \right) \right) \\ &= \left( A \backslash T \cap S \right) \cap \left( A \backslash \left( A \backslash T \right) \cap \left( A \backslash S \right) \right) \\ &= \left( A \backslash T \cap S \right) \cap \left( T \cup S \right) \ , \\ &= \left( \left( A \backslash T \right) \cup \left( A \backslash S \right) \right) \cap \left( T \cup S \right) \ , \\ &= \left( A \backslash T \right) \cap S \cup \left( A \backslash S \right) \cap T \ , \\ &= \left( S \backslash T \right) \cup \left( T \backslash S \right) \ , \\ &= \left$$

If *<sup>r</sup>* ∈ (*<sup>A</sup>* \ *<sup>T</sup>*) ∩ *<sup>S</sup>*, and C is a branch containing *<sup>r</sup>*, then *<sup>τ</sup>*<sup>C</sup> = <sup>∅</sup> and *<sup>σ</sup>*<sup>C</sup> = { <sup>∅</sup> }, so *<sup>τ</sup>* ≈/*<sup>r</sup> <sup>σ</sup>*. Similarly, if *<sup>r</sup>* ∈ (*<sup>A</sup>* \ *<sup>S</sup>*) ∩ *<sup>T</sup>*, then *<sup>τ</sup>*<sup>C</sup> = { <sup>∅</sup> } and *<sup>σ</sup>*<sup>C</sup> = <sup>∅</sup>, so *<sup>τ</sup>* ≈/*<sup>r</sup> <sup>σ</sup>*. Thus, for *<sup>r</sup>* ∈ *<sup>A</sup>* \ *<sup>R</sup>* we obtain *τ* ≈/*<sup>r</sup> σ*.

The set *R* is the equality set for *τ* and *σ*, and *A* \ *R* is the difference set:

$$\mathcal{R} = \mathcal{E}(\boldsymbol{\tau}, \boldsymbol{\sigma}) \; , \tag{22}$$

$$A \backslash R = \mathcal{D}(\tau, \sigma) \; . \tag{23}$$

The Lemma 4 enables evaluation of the equality degree of metasets representing character samples, i.e., the similarity of two characters.

We now prove the main result which shows the construction of the membership and non-membership sets for the given *A*-sample metaset and *A*-pattern metaset. In other words, 8 Will-be-set-by-IN-TECH

In this section we prove some technical facts strictly relevant to character recognition mechanism. We refer to them in the sequel. Proofs are not required for understanding the idea so they may be skipped on first reading. We supply them for mathematical completeness

The following lemma tells that for two given *A*-sample metasets *τ* and *σ*, their conditional difference is determined by the elements of the symmetric difference of their ranges: ran(*τ*) ran(*σ*), whereas their conditional equality is determined by the complement to *A*

We may express this property in terms of character recognition as follows. When comparing two characters, then not only the pixels belonging to them simultaneously affect the result of comparison, but also the pixels that belong to background of both. If a pixel belongs to one of the characters and for another character the same pixel forms the background, then such pixel

**Lemma 4.** *Let A be a finite maximal antichain in and let S*, *T* ⊂ *A be not empty. Let τ* = { ∅ } × *T*

*Proof.* Assume that *<sup>r</sup>* ∈ *<sup>S</sup>* ∩ *<sup>T</sup>*. If C is a branch containing *<sup>r</sup>*, then clearly *<sup>τ</sup>*<sup>C</sup> = { <sup>∅</sup> } = *<sup>σ</sup>*<sup>C</sup> , and therefore *<sup>τ</sup>* ≈*<sup>r</sup> <sup>σ</sup>*. If *<sup>r</sup>* ∈ (*<sup>A</sup>* \ *<sup>S</sup>*) ∩ (*<sup>A</sup>* \ *<sup>T</sup>*) and C is a branch containing *<sup>r</sup>*, then *<sup>τ</sup>*<sup>C</sup> = <sup>∅</sup> = *<sup>σ</sup>*<sup>C</sup> , so

*T* ∩ *S* ∪ (*A* \ *T*) ∩ (*A* \ *S*)

*r* ∈ *R* → *τ* ≈*<sup>r</sup> σ* , (20) *r* ∈ *A* \ *R* → *τ* ≈/*<sup>r</sup> σ* . (21)

,

,

*R* = E(*τ*, *σ*) , (22) *A* \ *R* = D(*τ*, *σ*) . (23)

*A* \ (*A* \ *T*) ∩ (*A* \ *S*)

<sup>∩</sup> (*<sup>T</sup>* <sup>∪</sup> *<sup>S</sup>*) ,

*and σ* = { ∅ } × *S. For R* = *S* ∩ *T* ∪ (*A* \ *S*) ∩ (*A* \ *T*) *the following implications hold:*

**2.2 Properties relevant to character recognition**

of the symmetric difference: *A* \ (ran(*τ*) ran(*σ*)).

asserts the difference between the characters.

*τ* ≈*<sup>r</sup> σ* holds too. This proves (20).

*A* \ *R* = *A* \

=

= *S T* .

The set *R* is the equality set for *τ* and *σ*, and *A* \ *R* is the difference set:

samples, i.e., the similarity of two characters.

= (*<sup>A</sup>* \ *<sup>T</sup>* <sup>∩</sup> *<sup>S</sup>*) <sup>∩</sup>

= (*A* \ *T* ∩ *S*) ∩ (*T* ∪ *S*) ,

(*A* \ *T*) ∪ (*A* \ *S*)

= (*S* \ *T*) ∪ (*T* \ *S*) ,

= (*A* \ *T*) ∩ *S* ∪ (*A* \ *S*) ∩ *T* ,

If *<sup>r</sup>* ∈ (*<sup>A</sup>* \ *<sup>T</sup>*) ∩ *<sup>S</sup>*, and C is a branch containing *<sup>r</sup>*, then *<sup>τ</sup>*<sup>C</sup> = <sup>∅</sup> and *<sup>σ</sup>*<sup>C</sup> = { <sup>∅</sup> }, so *<sup>τ</sup>* ≈/*<sup>r</sup> <sup>σ</sup>*. Similarly, if *<sup>r</sup>* ∈ (*<sup>A</sup>* \ *<sup>S</sup>*) ∩ *<sup>T</sup>*, then *<sup>τ</sup>*<sup>C</sup> = { <sup>∅</sup> } and *<sup>σ</sup>*<sup>C</sup> = <sup>∅</sup>, so *<sup>τ</sup>* ≈/*<sup>r</sup> <sup>σ</sup>*. Thus, for *<sup>r</sup>* ∈ *<sup>A</sup>* \ *<sup>R</sup>* we

The Lemma 4 enables evaluation of the equality degree of metasets representing character

We now prove the main result which shows the construction of the membership and non-membership sets for the given *A*-sample metaset and *A*-pattern metaset. In other words,

To prove (21) note, that:

obtain *τ* ≈/*<sup>r</sup> σ*.

and clarity.

it allows for evaluation of the similarity degree of a character testing sample (CTS) to the compound character pattern (CCP).

In the following theorem the metaset *σ* represents the testing sample (CTS), *ρ* is the compound character pattern (CCP) built up of potential elements *π<sup>i</sup>* representing characters. The sets *P<sup>i</sup>* and *S* constitute the structures of the pattern samples and the input sample. The sets *Q<sup>i</sup>* represent equality degrees of the CTS and CCP elements ans the sets *R<sup>i</sup>* represent the qualities of CCP members.

**Theorem 5.** *Let A be a maximal finite antichain in and let i* = 1, . . . , *k. Let P<sup>i</sup>* , *R<sup>i</sup>* , *S* ⊂ *A be not empty. Let <sup>σ</sup>* <sup>=</sup> { <sup>∅</sup> } <sup>×</sup> *S, <sup>π</sup><sup>i</sup>* <sup>=</sup> { <sup>∅</sup> } <sup>×</sup> *<sup>P</sup><sup>i</sup> and <sup>ρ</sup>* <sup>=</sup> *<sup>k</sup> i*=1 *πi* <sup>×</sup> *<sup>R</sup><sup>i</sup> be metasets. For the sets <sup>Q</sup><sup>i</sup>* <sup>=</sup> *<sup>S</sup>* <sup>∩</sup> *<sup>P</sup><sup>i</sup>* <sup>∪</sup> (*<sup>A</sup>* \ *<sup>S</sup>*) <sup>∩</sup> (*<sup>A</sup>* \ *<sup>P</sup><sup>i</sup>* ) *and U* = *<sup>k</sup> <sup>i</sup>*=<sup>1</sup> *<sup>Q</sup><sup>i</sup>* <sup>∩</sup> *<sup>R</sup><sup>i</sup> , the following holds:*

$$
\emptyset \in \mathbb{Q}^i \to \sigma \approx\_{\emptyset} \pi^i \,, \tag{24}
$$

$$
\emptyset \in A \backslash Q^i \to \sigma \not\models\_{\emptyset} \pi^i \,, \tag{25}
$$

$$
\mu \in \mathcal{U} \to \sigma \,\,\epsilon\_{\mathcal{U}} \,\,\rho \,\,\,\,. \tag{26}
$$

$$
\mu \in A \mid \mathcal{U} \to \sigma \notin\_{\mu} \rho \;. \tag{27}
$$

*Proof.* The Lemma 4 proves (24) and (25).

To prove (26) take *<sup>u</sup>* <sup>∈</sup> *<sup>U</sup>*. There exists *<sup>i</sup>* <sup>∈</sup> { 1... *<sup>k</sup>* } such, that *<sup>u</sup>* <sup>∈</sup> *<sup>R</sup><sup>i</sup>* <sup>∩</sup> *<sup>Q</sup><sup>i</sup>* . By (24) this implies *<sup>σ</sup>* <sup>≈</sup>*<sup>u</sup> <sup>π</sup><sup>i</sup>* , since *<sup>u</sup>* <sup>∈</sup> *<sup>Q</sup><sup>i</sup>* . By the construction of *<sup>ρ</sup>* – *<sup>u</sup>* <sup>∈</sup> *<sup>R</sup><sup>i</sup>* , so *πi* , *u* ∈ *ρ* – and by the Definition 3 we have *π<sup>i</sup> �<sup>u</sup> ρ*. Thus, by the Lemma 2 we obtain *σ �<sup>u</sup> ρ*.

To prove (27) let *R* = *<sup>k</sup> <sup>i</sup>*=<sup>1</sup> *<sup>R</sup><sup>i</sup>* and let *<sup>Q</sup>*¯*<sup>i</sup>* <sup>=</sup> *<sup>A</sup>* \ *<sup>Q</sup><sup>i</sup>* . We may split each *R<sup>i</sup>* into two parts: *R<sup>i</sup>* = *<sup>R</sup><sup>i</sup>* \ *<sup>Q</sup><sup>i</sup>* ∪ *<sup>R</sup><sup>i</sup>* <sup>∩</sup> *<sup>Q</sup><sup>i</sup>* <sup>=</sup> *<sup>R</sup><sup>i</sup>* <sup>∩</sup> *<sup>Q</sup>*¯*<sup>i</sup>* <sup>∪</sup> *<sup>R</sup><sup>i</sup>* <sup>∩</sup> *<sup>Q</sup><sup>i</sup>* . Therefore,

$$\mathcal{R} = \bigcup\_{i=1}^{k} \mathbb{R}^{i} \cap \bar{\mathcal{Q}}^{i} \cup \bigcup\_{i=1}^{k} \mathbb{R}^{i} \cap \mathcal{Q}^{i} = \bigcup\_{i=1}^{k} \left( \mathbb{R}^{i} \cap \bar{\mathcal{Q}}^{i} \right) \cup \mathcal{U} \,. \tag{28}$$

Let *u* ∈ *A* \ *U* and let C be a branch containing *u*. Note, that *U* ⊂ *R* ⊂ *A*, so we consider two cases: *u* ∈ *R* \ *U* and *u* ∈ *A* \ *R*.

If *<sup>u</sup>* <sup>∈</sup> *<sup>R</sup>* \ *<sup>U</sup>*, then let *<sup>I</sup>* <sup>⊂</sup> { 1... *<sup>k</sup>* } be the set of all those *<sup>i</sup>*, for which *<sup>u</sup>* <sup>∈</sup> *<sup>R</sup><sup>i</sup>* <sup>∩</sup> *<sup>Q</sup>*¯*<sup>i</sup>* . Since *u* ∈ *C* and for each *<sup>i</sup>* <sup>∈</sup> *<sup>I</sup>* the intersection *<sup>R</sup><sup>i</sup>* ∩ C contains at most one element (which is *<sup>u</sup>*), then by the Definition 2

$$\rho\_{\mathcal{C}} = \left\{ \pi\_{\mathcal{C}}^{i} \colon 1 \le i \le k \land \mathbb{R}^{i} \cap \mathcal{C} \ne \mathcal{O} \right\} = \left\{ \pi\_{\mathcal{C}}^{i} \colon 1 \le i \le k \land u \in \mathbb{R}^{i} \right\} = \left\{ \pi\_{\mathcal{C}}^{i} \colon i \in I \right\} \dots \tag{29}$$

The last equality is implied by the following (since *u* �∈ *U*):

$$\left\{ i \colon u \in \mathbb{R}^{i} \right\} = \left\{ i \colon u \in \mathbb{R}^{i} \cap \bar{\mathbb{Q}}^{i} \right\} \cup \left\{ i \colon u \in \mathbb{R}^{i} \cap \mathbb{Q}^{i} \right\} = I \cup \mathcal{D} = I \ . \tag{30}$$

Thus, for *<sup>i</sup>* <sup>∈</sup> *<sup>I</sup>* we have *<sup>π</sup><sup>i</sup>* <sup>C</sup> <sup>∈</sup> *<sup>ρ</sup>*<sup>C</sup> . However, for *<sup>i</sup>* <sup>∈</sup> *<sup>I</sup>* we also have *<sup>u</sup>* �∈ *<sup>Q</sup><sup>i</sup>* , so by (25) holds *<sup>σ</sup>* <sup>≈</sup>/*<sup>u</sup> <sup>π</sup><sup>i</sup>* and consequently *<sup>σ</sup>*<sup>C</sup> �<sup>=</sup> *<sup>π</sup><sup>i</sup>* <sup>C</sup> . Since *<sup>σ</sup>*<sup>C</sup> is different than all the members of *<sup>ρ</sup>*<sup>C</sup> , then *σ*<sup>C</sup> �∈ *ρ*<sup>C</sup> for any branch C � *u*, what gives *σ �*/*<sup>u</sup> ρ*. This proves (27) for the case when *u* ∈ *R* \ *U*. If *<sup>u</sup>* ∈ *<sup>A</sup>* \ *<sup>R</sup>*, then for C � *<sup>u</sup>* we have *<sup>ρ</sup>*<sup>C</sup> = <sup>∅</sup>, so *σ �*/*<sup>u</sup> <sup>ρ</sup>* for any *<sup>σ</sup>*, what implies the second case for (27).

The set *U* is the membership set for *σ* in *ρ*, and *A* \ *U* is the non-membership set:

$$\mathcal{U} = \mathbf{M}(\sigma, \rho) \; \; \; \; \; \; \; \; \tag{31}$$

The simplest example of such assignment is when *<sup>r</sup>* · *<sup>c</sup>* <sup>=</sup> <sup>2</sup>*<sup>k</sup>* for some *<sup>k</sup>*. In such case the nodes of the *k*-th level of the binary tree may be assigned in an arbitrary way to the cells. We call such one-to-one mapping of matrix and some level in an *even* mapping. The Figure 2

Character Recognition with Metasets 25

 0001 0010 0011 0101 0110 0111 1001 1010 1011 1101 1110 1111

Fig. 2. A standard mapping of the level 4 of the binary tree to cells of the 4 × 4 matrix.

We call such a mapping *uneven*. See Fig. 3 for an example of uneven 3 × 4 mapping.

different levels of , since levels contain 2*<sup>k</sup>* nodes. Anyway, the image *m*(*X<sup>c</sup>*

Fig. 3. Mapping of some antichain in to cells of the 3 × 4 matrix.

influence the resulting similarity degree more than others.

Fig. 4. Simple assignment for stressing the dot over 'i'.

When *<sup>r</sup>* · *<sup>c</sup>* �<sup>=</sup> <sup>2</sup>*<sup>k</sup>* for any *<sup>k</sup>* <sup>∈</sup> , then the cells of the matrix must be mapped to nodes from

11100 000 001 11110 1100 010 011 1101 11101 100 101 11111

For an even mapping the placement of particular nodes is rather irrelevant. On the other hand, when the mapping is uneven, then the nodes from different levels assigned to cells impose the following interpretation. Parts of the matrix which are more important for the particular character, and which we want to stress somehow by distinguishing it from the rest, are associated with nodes which are closer to the root – the weaker conditions. The cells which are of less importance contain nodes from lower levels of the tree – the stronger conditions. Weaker conditions have more impact on the resulting membership and equality degrees than stronger ones (cf. Equations 12–15). For instance we might be particularly interested in proper recognizing of the dot over the letter 'i'. In such case we may use the assignment depicted on the Fig. 4. The reader is encouraged to check that the nodes form a maximal antichain. The cells containing the nodes 10 and 110 are more sensitive to errors than other cells and they

> 0000 10 0100 0001 110 0101 0010 1110 0110 0011 1111 0111

For simplicity, most examples will be based on this 4 × 4 matrix and the mapping.

<sup>4</sup> �→ <sup>4</sup> onto the level 4 of the tree.

*<sup>r</sup>*) must be a MFA.

demonstrates a sample 4 <sup>×</sup> 4 matrix with a mapping *<sup>m</sup>*: *<sup>X</sup>*<sup>4</sup>

$$A \backslash U = \mathcal{N}(\sigma, \rho) \; . \tag{32}$$

The sequence of equality sets *Q<sup>i</sup>* = E(*σ*, *π<sup>i</sup>* ) enables evaluation of equality degrees of *A*-sample metaset *σ* and potential elements *π<sup>i</sup>* of *A*-pattern metaset *ρ*. They show the distribution of the overall similarity degree among the pattern elements.

#### **3. Character recognition with metasets**

In this section we explain the core of the idea of applying metasets to recognition of characters. We show how to represent characters and compound character patterns as metasets. Then we we show how to calculate appropriate membership and equality degrees and interpret them as quality grades of the input samples.

The procedure we discuss here involves two stages. During the first stage we define the compound character pattern (CCP). It represents a single character and it is comprised of a number of different samples of the character. The samples are graded with quality grades.

In the second stage we supply character testing samples (CTS) and we calculate the result which is the similarity degree of CTS to CCP. The similarity degree tells how close is the CTS to the character represented by CCP. Besides the overall similarity degree we obtain also the sequence of similarity degrees of the CTS to each member of the CCP. These degrees show how close is the input sample to each element of the compound pattern.

The compound character pattern is represented by a metaset, whose potential elements represent particular character samples of the pattern. The testing sample is represented by a metaset too. The resulting similarity degree is the membership degree of CTS in CCP. The additional similarity degrees of CTS to pattern elements are partial equality degrees of CTS to potential elements of CCP.

One of the goals of this section is to convince the reader that partial membership and equality degrees of metasets encoding character samples properly reflect the human perception of similarity of characters.

#### **3.1 Representing characters as metasets**

Characters are displayed on the matrix *X<sup>c</sup> <sup>r</sup>* comprised of *r* rows and *c* columns (shortly: *X*). The natural numbers *r* and *c* may be arbitrary, however they must remain constant throughout the matching process: all the character samples in the CCP pattern as well as all the CTS input samples must use the same matrix dimensions. We focus on monochromatic images here, so the cells of the matrix acquire two states: selected ones belong to the character and deselected ones form the background. For the given character *a* displayed on the matrix, the set of selected cells is denoted by *Xa*.

Prior to defining character samples, a mapping *m*: *X* �→ between matrix cells and nodes of the binary tree must be established. To each cell of the matrix a node of the binary tree must be assigned so that the set of assigned nodes – denoted by *m*(*X*) – forms a maximal antichain *A* in the tree . The assignment of nodes to cells is arbitrary – no special ordering is required. The antichain and the mapping are constant for the whole character matching process – all the CTS and CCP samples use the same *A* and *m*. Note, that since the nodes assigned to cells form a MFA, then any branch in the tree contains exactly one assigned node.

10 Will-be-set-by-IN-TECH

*A*-sample metaset *σ* and potential elements *π<sup>i</sup>* of *A*-pattern metaset *ρ*. They show the

In this section we explain the core of the idea of applying metasets to recognition of characters. We show how to represent characters and compound character patterns as metasets. Then we we show how to calculate appropriate membership and equality degrees and interpret them

The procedure we discuss here involves two stages. During the first stage we define the compound character pattern (CCP). It represents a single character and it is comprised of a number of different samples of the character. The samples are graded with quality grades. In the second stage we supply character testing samples (CTS) and we calculate the result which is the similarity degree of CTS to CCP. The similarity degree tells how close is the CTS to the character represented by CCP. Besides the overall similarity degree we obtain also the sequence of similarity degrees of the CTS to each member of the CCP. These degrees show

The compound character pattern is represented by a metaset, whose potential elements represent particular character samples of the pattern. The testing sample is represented by a metaset too. The resulting similarity degree is the membership degree of CTS in CCP. The additional similarity degrees of CTS to pattern elements are partial equality degrees of CTS to

One of the goals of this section is to convince the reader that partial membership and equality degrees of metasets encoding character samples properly reflect the human perception of

The natural numbers *r* and *c* may be arbitrary, however they must remain constant throughout the matching process: all the character samples in the CCP pattern as well as all the CTS input samples must use the same matrix dimensions. We focus on monochromatic images here, so the cells of the matrix acquire two states: selected ones belong to the character and deselected ones form the background. For the given character *a* displayed on the matrix, the

Prior to defining character samples, a mapping *m*: *X* �→ between matrix cells and nodes of the binary tree must be established. To each cell of the matrix a node of the binary tree must be assigned so that the set of assigned nodes – denoted by *m*(*X*) – forms a maximal antichain *A* in the tree . The assignment of nodes to cells is arbitrary – no special ordering is required. The antichain and the mapping are constant for the whole character matching process – all the CTS and CCP samples use the same *A* and *m*. Note, that since the nodes assigned to cells

form a MFA, then any branch in the tree contains exactly one assigned node.

*U* = M(*σ*, *ρ*) , (31) *A* \ *U* = N(*σ*, *ρ*) . (32)

*<sup>r</sup>* comprised of *r* rows and *c* columns (shortly: *X*).

) enables evaluation of equality degrees of

The set *U* is the membership set for *σ* in *ρ*, and *A* \ *U* is the non-membership set:

distribution of the overall similarity degree among the pattern elements.

how close is the input sample to each element of the compound pattern.

The sequence of equality sets *Q<sup>i</sup>* = E(*σ*, *π<sup>i</sup>*

**3. Character recognition with metasets**

as quality grades of the input samples.

potential elements of CCP.

similarity of characters.

**3.1 Representing characters as metasets** Characters are displayed on the matrix *X<sup>c</sup>*

set of selected cells is denoted by *Xa*.

The simplest example of such assignment is when *<sup>r</sup>* · *<sup>c</sup>* <sup>=</sup> <sup>2</sup>*<sup>k</sup>* for some *<sup>k</sup>*. In such case the nodes of the *k*-th level of the binary tree may be assigned in an arbitrary way to the cells. We call such one-to-one mapping of matrix and some level in an *even* mapping. The Figure 2 demonstrates a sample 4 <sup>×</sup> 4 matrix with a mapping *<sup>m</sup>*: *<sup>X</sup>*<sup>4</sup> <sup>4</sup> �→ <sup>4</sup> onto the level 4 of the tree. For simplicity, most examples will be based on this 4 × 4 matrix and the mapping.


Fig. 2. A standard mapping of the level 4 of the binary tree to cells of the 4 × 4 matrix.

When *<sup>r</sup>* · *<sup>c</sup>* �<sup>=</sup> <sup>2</sup>*<sup>k</sup>* for any *<sup>k</sup>* <sup>∈</sup> , then the cells of the matrix must be mapped to nodes from different levels of , since levels contain 2*<sup>k</sup>* nodes. Anyway, the image *m*(*X<sup>c</sup> <sup>r</sup>*) must be a MFA. We call such a mapping *uneven*. See Fig. 3 for an example of uneven 3 × 4 mapping.


Fig. 3. Mapping of some antichain in to cells of the 3 × 4 matrix.

For an even mapping the placement of particular nodes is rather irrelevant. On the other hand, when the mapping is uneven, then the nodes from different levels assigned to cells impose the following interpretation. Parts of the matrix which are more important for the particular character, and which we want to stress somehow by distinguishing it from the rest, are associated with nodes which are closer to the root – the weaker conditions. The cells which are of less importance contain nodes from lower levels of the tree – the stronger conditions. Weaker conditions have more impact on the resulting membership and equality degrees than stronger ones (cf. Equations 12–15). For instance we might be particularly interested in proper recognizing of the dot over the letter 'i'. In such case we may use the assignment depicted on the Fig. 4. The reader is encouraged to check that the nodes form a maximal antichain. The cells containing the nodes 10 and 110 are more sensitive to errors than other cells and they influence the resulting similarity degree more than others.


Fig. 4. Simple assignment for stressing the dot over 'i'.

shapes collected together give an idea of how the character should look like. They may be supplied by independent experts or they may be samples of hand-writing retrieved from

Character Recognition with Metasets 27

The most important factor here is defining the quality grades for samples included in the CCP as sets of nodes of the binary tree. Instead of giving them numerical values – as it is usually done – initially we define quality grades to be the parts of the character matrix which contain valid data, for each sample separately. Thus, the quality grade of a character sample is the set of cells containing correct, necessary pixels of the character or its background. For each sample, the cells of the matrix which are considered bad, missing or not important are excluded by the quality grade area and therefore, they are not taken into account during the recognition process. The mapping *m* transforms this quality set into a subset of the maximal finite antichain *A*, which may be evaluated as a number then, however some part of information is lost this way (i.e., which exactly cells are taken into account and which are

Associating character samples represented as *A*-sample metasets with the corresponding quality grades represented as subsets of *A* we create the *A*-pattern metaset representing the CCP. Then, testing character samples represented by other *A*-sample metasets are matched

If we denote characters included in the CCP with the variables *c*1, *c*2, . . ., then *Xc*1, *Xc*2,... are the sets of cells of the matrix which contain their pixels – the selected cells. The metasets

The corresponding quality grades *q*1, *q*2, . . ., when expressed as sets of cells of the matrix *X* are denoted with *Xq*1, *Xq*2, . . .. Thus, *m*(*Xq*1), *m*(*Xq*2), . . . are subsets of *A* specifying quality grades of characters *c*1, *c*2, . . ., or – in other words – they are membership degrees of the *A*-sample metasets *χ<sup>i</sup>* in the *A*-pattern metaset *π* representing the CCP, which is defined as

The complete structure of the *A*-pattern metaset *π* representing the compound character pattern comprised of the characters *c*1,..., *cn* accompanied by the quality grades *q*1,..., *qn*

mapping *m* onto the antichain *A* = <sup>4</sup> which is the 4th level of the tree (cf. Fig.2). The Figure 7 depicts three different samples of the letter 'c'. Pixels of characters are those containing binary sequences. Invalid cells are marked gray; the cells without background form the quality

We understand that the areas of the matrix with gray background contain pixels which are either unreadable or we are not sure whether they are selected or not, or they are distorted

*χ<sup>i</sup>* = { ∅ } × *m*(*Xci*) . (36)

{ *χ<sup>i</sup>* } × *m*(*Xqi*) . (37)

{ { ∅ } × *m*(*Xci*) } × *m*(*Xqi*). (38)

<sup>4</sup> matrix with the standard

corresponding to the characters are denoted with *χ*1, *χ*2, . . . (cf. Equation 33):

*π* = *i* =*n i*=1

different persons.

considered invalid).

against the *A*-pattern metaset.

follows (*n* is the number of samples in the pattern):

*π* = *i* =*n i*=1

We illustrate the above formulas with an example. We use the *X*<sup>4</sup>

is depicted by the following equation

grades.

Note, that even when *<sup>r</sup>* · *<sup>c</sup>* <sup>=</sup> <sup>2</sup>*<sup>k</sup>* for some *<sup>k</sup>*, then the mapping might be uneven too, since we may assign nodes from different levels to cells in order to stress some areas of the matrix and diminish the influence of others. Anyway, the requirement that the range forms a maximal antichain must be fulfilled. The assignment on the Fig. 5 shows how to stress the upper-left corner of the *X*<sup>2</sup> <sup>2</sup> matrix. The impact of the lower row is much less than the impact of the upper row in this case.


Fig. 5. Uneven assignment for 2 × 2 matrix.

We now construct the metaset *χ* representing the character denoted by *a* displayed on the matrix *X*. The domain of the metaset consists of the empty set only: dom(*χ*) = { ∅ }. The set *m*(*Xa*) ⊂ *A* of nodes corresponding to the marked cells of the matrix forms the range of the metaset representing the sample: ran(*χ*) = *m*(*Xa*). Since the domain of *χ* contains exactly one element ∅, then ran(*χ*) = *χ*[∅]. Thus,

$$\chi = \{ \oslash \} \times \mathfrak{m}(\mathbf{Xa}) \; . \tag{33}$$

Note, that we interpret the membership degree of ∅ in *χ* as the set of selected cells of the character. This membership degree is irrelevant by itself, however, it determines the equality degree of this sample and any other CTS supplied during the recognition phase. It also affects the overall result which is the membership degree of the CTS in the CCP.

As an example, let us represent the character 'c' on the 4 × 4 matrix with the standard assignment, like on the Fig. 6. The metaset representing this letter is

$$\chi = \{ \begin{array}{l} \langle \mathcal{Q}, 0001 \rangle, \langle \mathcal{Q}, 0010 \rangle, \langle \mathcal{Q}, 0011 \rangle, \langle \mathcal{Q}, 0100 \rangle \end{array} \rangle, \tag{34}$$

$$\langle \mathcal{Q}, 1000 \rangle, \langle \mathcal{Q}, 1101 \rangle, \langle \mathcal{Q}, 1110 \rangle, \langle \mathcal{Q}, 1111 \rangle \} \ . $$

The set of nodes corresponding to the selected cells is

$$m(\mathbf{Xc}) = \{0001, 0010, 0011, 0100, 1000, 1101, 1110, 1111\} \,\,. \tag{35}$$


Fig. 6. The character 'c' represented on the 4 × 4 matrix.

#### **3.2 Defining the compound pattern**

Defining the compound character pattern (CCP) is the essential step in the process of character recognition with metasets. The CCP consists of a number of character samples accompanied by quality grades. The samples describe some point of view on the character. The different 12 Will-be-set-by-IN-TECH

Note, that even when *<sup>r</sup>* · *<sup>c</sup>* <sup>=</sup> <sup>2</sup>*<sup>k</sup>* for some *<sup>k</sup>*, then the mapping might be uneven too, since we may assign nodes from different levels to cells in order to stress some areas of the matrix and diminish the influence of others. Anyway, the requirement that the range forms a maximal antichain must be fulfilled. The assignment on the Fig. 5 shows how to stress the upper-left

> 0 10 110 111

We now construct the metaset *χ* representing the character denoted by *a* displayed on the matrix *X*. The domain of the metaset consists of the empty set only: dom(*χ*) = { ∅ }. The set *m*(*Xa*) ⊂ *A* of nodes corresponding to the marked cells of the matrix forms the range of the metaset representing the sample: ran(*χ*) = *m*(*Xa*). Since the domain of *χ* contains exactly

Note, that we interpret the membership degree of ∅ in *χ* as the set of selected cells of the character. This membership degree is irrelevant by itself, however, it determines the equality degree of this sample and any other CTS supplied during the recognition phase. It also affects

As an example, let us represent the character 'c' on the 4 × 4 matrix with the standard

�∅, 1000�,�∅, 1101�,�∅, 1110�,�∅, 1111� } .

0001 0010 0011

1101 1110 1111

Defining the compound character pattern (CCP) is the essential step in the process of character recognition with metasets. The CCP consists of a number of character samples accompanied by quality grades. The samples describe some point of view on the character. The different

*χ* = { �∅, 0001�,�∅, 0010�,�∅, 0011�,�∅, 0100�, (34)

*m*(*X*c) = { 0001, 0010, 0011, 0100, 1000, 1101, 1110, 1111 } . (35)

the overall result which is the membership degree of the CTS in the CCP.

0100 1000

assignment, like on the Fig. 6. The metaset representing this letter is

The set of nodes corresponding to the selected cells is

Fig. 6. The character 'c' represented on the 4 × 4 matrix.

**3.2 Defining the compound pattern**

<sup>2</sup> matrix. The impact of the lower row is much less than the impact of the upper

*χ* = { ∅ } × *m*(*Xa*) . (33)

corner of the *X*<sup>2</sup>

row in this case.

Fig. 5. Uneven assignment for 2 × 2 matrix.

one element ∅, then ran(*χ*) = *χ*[∅]. Thus,

shapes collected together give an idea of how the character should look like. They may be supplied by independent experts or they may be samples of hand-writing retrieved from different persons.

The most important factor here is defining the quality grades for samples included in the CCP as sets of nodes of the binary tree. Instead of giving them numerical values – as it is usually done – initially we define quality grades to be the parts of the character matrix which contain valid data, for each sample separately. Thus, the quality grade of a character sample is the set of cells containing correct, necessary pixels of the character or its background. For each sample, the cells of the matrix which are considered bad, missing or not important are excluded by the quality grade area and therefore, they are not taken into account during the recognition process. The mapping *m* transforms this quality set into a subset of the maximal finite antichain *A*, which may be evaluated as a number then, however some part of information is lost this way (i.e., which exactly cells are taken into account and which are considered invalid).

Associating character samples represented as *A*-sample metasets with the corresponding quality grades represented as subsets of *A* we create the *A*-pattern metaset representing the CCP. Then, testing character samples represented by other *A*-sample metasets are matched against the *A*-pattern metaset.

If we denote characters included in the CCP with the variables *c*1, *c*2, . . ., then *Xc*1, *Xc*2,... are the sets of cells of the matrix which contain their pixels – the selected cells. The metasets corresponding to the characters are denoted with *χ*1, *χ*2, . . . (cf. Equation 33):

$$\chi\_{i} = \{ \bigotimes \} \times \mathfrak{m}(\mathbf{Xc}\_{i}) \; . \tag{36}$$

The corresponding quality grades *q*1, *q*2, . . ., when expressed as sets of cells of the matrix *X* are denoted with *Xq*1, *Xq*2, . . .. Thus, *m*(*Xq*1), *m*(*Xq*2), . . . are subsets of *A* specifying quality grades of characters *c*1, *c*2, . . ., or – in other words – they are membership degrees of the *A*-sample metasets *χ<sup>i</sup>* in the *A*-pattern metaset *π* representing the CCP, which is defined as follows (*n* is the number of samples in the pattern):

$$\pi = \bigcup\_{i=1}^{i=n} \{ \chi\_i \} \times m(\mathbf{X}q\_i) \; . \tag{37}$$

The complete structure of the *A*-pattern metaset *π* representing the compound character pattern comprised of the characters *c*1,..., *cn* accompanied by the quality grades *q*1,..., *qn* is depicted by the following equation

$$\pi = \bigcup\_{i=1}^{i=n} \left\{ \left\{ \mathcal{Q} \right\} \times m(\mathbf{X}c\_i) \right\} \times m(\mathbf{X}q\_i). \tag{38}$$

We illustrate the above formulas with an example. We use the *X*<sup>4</sup> <sup>4</sup> matrix with the standard mapping *m* onto the antichain *A* = <sup>4</sup> which is the 4th level of the tree (cf. Fig.2). The Figure 7 depicts three different samples of the letter 'c'. Pixels of characters are those containing binary sequences. Invalid cells are marked gray; the cells without background form the quality grades.

We understand that the areas of the matrix with gray background contain pixels which are either unreadable or we are not sure whether they are selected or not, or they are distorted

Thus, the *A*-pattern metaset *π* representing the CCP has the following complete structure:

Character Recognition with Metasets 29

× { 0000, 0001, 0010, 0011, 0100, 0101, 0110, 1000, 1001, 1100 } ∪ {{ ∅ } × { 0000, 0001, 0010, 0100, 1000, 1100, 1101, 1110 }} × { 0100, 0101, 1000, 1001, 1010, 1100, 1101, 1110 }

∪ {{ ∅ } × { 0001, 0010, 0011, 0100, 0101, 1000, 1001, 1101, 1110, 1111 }}

*p*∈*π*[*χi*]

1

<sup>2</sup>|*p*<sup>|</sup> , (50)

For each *χ<sup>i</sup>* ∈ dom(*π*) the numerical values of the membership degrees of *χ<sup>i</sup>* in *π*, equal to

and they are equal 0.62, 0.5 and 0.38, for the characters *c*1, *c*2, *c*<sup>3</sup> respectively (the reader is encouraged to verify it). The numerical representation of the degree looses information concerning the particular cells taken into account. For instance, there are many combinations of cells for which the above formula gives the result of 0.5. The numerical value is more

Once the compound character pattern (CCP) is prepared we are ready to supply testing character samples (CTS) and evaluate their similarity degrees. The CTS is represented as a metaset in exactly the same manner as CCP elements, i.e., we use the same matrix *X* with the

The process of matching the input character sample represented by the *A*-sample metaset *τ* against the prepared compound character pattern represented by the *A*-pattern metaset *π* involves calculation of the membership degree of *τ* in *π* and the sequence of equality degrees of *τ* and potential elements *χ<sup>i</sup>* of *π*. The membership degree tells us to what measure the CTS resembles the character defined by the CCP and is represented by the set M(*τ*, *π*) (see Equation 8). The equality degrees play supplemental role and they show the similarity of *τ* and each pattern element separately, which – contrary to the CCP – are single characters. They are represented by the sets E(*τ*, *χi*) (see Equation 10). We apply here the Theorem 5 for determining the similarity degrees and also the Equations 12–15 for numerical evaluation of

Let us make calculations for the sample letter 'c' shown on the Fig. 6. The metaset *χ* representing the character is defined by the Equation (34). First, we establish the notation. The left hand sides of the following equations correspond to variables used in the Theorem 5

> *σ* = *χ* the CTS, see Fig. 6 and Equation 34 , (51) *π<sup>i</sup>* = *χ<sup>i</sup>* the pattern elements, see Equations 42–44 , (52) *ρ* = *π* the CCP, see Fig. 7 and Equation 49 , (53) *S* = *m*(*X*c) the CTS selected cells, see Equation 35 , (54)

and in the right hand sides we use metasets defined in previous sections, *i* = 1, 2, 3.

m(*χi*, *π*) = ∑

= {{ ∅ } × { 0001, 0010, 0100, 1000, 1101, 1110 }}

× { 0011, 0110, 0111, 1010, 1011, 1111 } .

their numerical quality grades, are given by the formula (cf. Equation 12):

same mapping *m* of cells to some maximal finite antichain *A*.

human-friendly, however.

the degres.

**3.3 Evaluating similarity degrees**

*π* = { *χ*<sup>1</sup> } × *m*(*Xq*1) ∪ { *χ*<sup>2</sup> } × *m*(*Xq*2) ∪ { *χ*<sup>3</sup> } × *m*(*Xq*3) (49)


Fig. 7. Three samples *c*1, *c*2, *c*<sup>3</sup> of letter 'c'. Cells without gray background make up quality grades.

somehow, and therefore they cannot be included in the representation of the character without causing any doubt. They may also be treated as a mask for excluding parts of the matrix from the matching process. Anyway, when calculating equality degrees the whole matrix area is taken into account, so excluded parts play role in determining the membership (similarity) degree to the CCP only.

The sets of nodes corresponding to the selected cells of the characters *c*1, *c*<sup>2</sup> and *c*<sup>3</sup> on the Fig. 7 are shown by the following equations:

$$m(\mathbf{X}c\_1) = \{0001, 0010, 0100, 1000, 1101, 1110\}\;\;\;\;\;\tag{39}$$

*m*(*Xc*2) = { 0000, 0001, 0010, 0100, 1000, 1100, 1101, 1110 } , (40)

$$m(\mathbf{X}\mathbf{c}\_3) = \{0001, 0010, 0011, 0100, 0101, 1000, 1001, 1101, 1110, 1111\} \,\, . \tag{41}$$

The *A*-sample metasets *χ*1, *χ*2, *χ*<sup>3</sup> representing these characters have form (cf. Equation 36):

$$\chi\_1 = \{ \begin{array}{c} \langle \mathcal{Q}, 0001 \rangle, \langle \mathcal{Q}, 0010 \rangle, \langle \mathcal{Q}, 0100 \rangle, \langle \mathcal{Q}, 1000 \rangle, \langle \mathcal{Q}, 1101 \rangle, \langle \mathcal{Q}, 1110 \rangle \end{array} \rangle, \tag{42}$$

$$\begin{array}{lclcl}\chi\_{2} &=& \left\{ \left< \mathcal{Q}, 0000 \right>, \left< \mathcal{Q}, 0001 \right>, \left< \mathcal{Q}, 0010 \right>, \left< \mathcal{Q}, 0100 \right> \\ & & \left< \mathcal{Q}, 1000 \right>, \left< \mathcal{Q}, 1100 \right>, \left< \mathcal{Q}, 1101 \right>, \left< \mathcal{Q}, 1110 \right> \right\} \\ & & \sim & \left\{ \left< \mathcal{Q}, 0001 \right> \left< \mathcal{Q}, 0010 \right> \left< \mathcal{Q}, 0011 \right> \left< \mathcal{Q}, 0100 \right> \left< \mathcal{Q}, 0101 \right> \end{array} \tag{43}$$

$$\begin{array}{lcl}\chi\_{3} &=& \left\{ \left< \mathcal{Q}, 0001 \right>, \left< \mathcal{Q}, 0010 \right>, \left< \mathcal{Q}, 0011 \right>, \left< \mathcal{Q}, 0100 \right>, \left< \mathcal{Q}, 0101 \right> \right. \\ & & \left< \mathcal{Q}, 1000 \right>, \left< \mathcal{Q}, 1101 \right>, \left< \mathcal{Q}, 1101 \right>, \left< \mathcal{Q}, 1110 \right>, \left< \mathcal{Q}, 1111 \right> \right\} \end{array} \tag{44}$$

They comprise the domain of the *A*-pattern metaset *π*: dom(*π*) = { *χ*1, *χ*2, *χ*<sup>3</sup> }. The quality grades *qi* of the samples *ci* – represented by the cells without gray background – when mapped to subsets of *A* with the mapping *m*, make up the membership degrees *m*(*Xqi*) of *χ<sup>i</sup>* in the *π*:

$$
\pi[\chi\_i] = \mathfrak{m}(Xq\_i), \text{ for } i = 1, 2, 3 \; . \tag{45}
$$

From the Fig. 7 we may read that

$$\begin{aligned} m(Xq\_1) &= \{ 0000, 0001, 0010, 0011, 0100, 0101, 0110, 1000, 1001, 1100 \} \\ &= \mathbb{T}\_4 \backslash \{ 0111, 1010, 1011, 1101, 1110, 1111 \} \end{aligned} \tag{46}$$

$$m(Xq\_2) = \{0100, 0101, 1000, 1001, 1010, 1100, 1101, 1110\}\tag{47}$$

$$=\mathbb{T}\_4 \backslash \{ 0000, 0001, 0010, 0011, 0110, 0111, 1011, 1111 \} \mid$$

$$m(Xq\_3) = \{0011, 0110, 0111, 1010, 1011, 1111\}\tag{48}$$
 
$$\text{Im} \quad \text{(моão, мом }\text{моão, мом }\text{моão }\text{4моão }\text{4моão }\text{4моão }\text{4моão}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{м}\text{ (10, 11, 12, 13)}$$

= <sup>4</sup> \ { 0000, 0001, 0010, 0100, 0101, 1000, 1001, 1100, 1101, 1110 } .

14 Will-be-set-by-IN-TECH

0001 0010 0011

1101 1110 1111

0100 0101 1000 1001

0000 0001 0010

1100 1101 1110

Fig. 7. Three samples *c*1, *c*2, *c*<sup>3</sup> of letter 'c'. Cells without gray background make up quality

somehow, and therefore they cannot be included in the representation of the character without causing any doubt. They may also be treated as a mask for excluding parts of the matrix from the matching process. Anyway, when calculating equality degrees the whole matrix area is taken into account, so excluded parts play role in determining the membership (similarity)

The sets of nodes corresponding to the selected cells of the characters *c*1, *c*<sup>2</sup> and *c*<sup>3</sup> on the Fig. 7

The *A*-sample metasets *χ*1, *χ*2, *χ*<sup>3</sup> representing these characters have form (cf. Equation 36):

They comprise the domain of the *A*-pattern metaset *π*: dom(*π*) = { *χ*1, *χ*2, *χ*<sup>3</sup> }. The quality grades *qi* of the samples *ci* – represented by the cells without gray background – when mapped to subsets of *A* with the mapping *m*, make up the membership degrees *m*(*Xqi*) of *χ<sup>i</sup>* in the *π*:

*χ*<sup>2</sup> = { �∅, 0000�,�∅, 0001�,�∅, 0010�,�∅, 0100�,

*χ*<sup>3</sup> = { �∅, 0001�,�∅, 0010�,�∅, 0011�,�∅, 0100�,�∅, 0101�,

= <sup>4</sup> \ { 0111, 1010, 1011, 1101, 1110, 1111 } ,

*m*(*Xc*1) = { 0001, 0010, 0100, 1000, 1101, 1110 } , (39) *m*(*Xc*2) = { 0000, 0001, 0010, 0100, 1000, 1100, 1101, 1110 } , (40) *m*(*Xc*3) = { 0001, 0010, 0011, 0100, 0101, 1000, 1001, 1101, 1110, 1111 } . (41)

*χ*<sup>1</sup> = { �∅, 0001�,�∅, 0010�,�∅, 0100�,�∅, 1000�,�∅, 1101�,�∅, 1110� } , (42)

*m*(*Xq*1) = { 0000, 0001, 0010, 0011, 0100, 0101, 0110, 1000, 1001, 1100 } (46)

*m*(*Xq*2) = { 0100, 0101, 1000, 1001, 1010, 1100, 1101, 1110 } (47)

*m*(*Xq*3) = { 0011, 0110, 0111, 1010, 1011, 1111 } (48) = <sup>4</sup> \ { 0000, 0001, 0010, 0100, 0101, 1000, 1001, 1100, 1101, 1110 } .

= <sup>4</sup> \ { 0000, 0001, 0010, 0011, 0110, 0111, 1011, 1111 } ,

�∅, 1000�,�∅, 1100�,�∅, 1101�,�∅, 1110� } , (43)

�∅, 1000�,�∅, 1101�,�∅, 1101�,�∅, 1110�,�∅, 1111� } . (44)

*π*[*χi*] = *m*(*Xqi*), for *i* = 1, 2, 3 . (45)

0100 1000

0001 0010

1101 1110

are shown by the following equations:

From the Fig. 7 we may read that

0100 1000

degree to the CCP only.

grades.

Thus, the *A*-pattern metaset *π* representing the CCP has the following complete structure:

$$
\pi = \left\{ \chi\_1 \right\} \times \mathfrak{m}(\mathrm{Xq}\_1) \cup \left\{ \chi\_2 \right\} \times \mathfrak{m}(\mathrm{Xq}\_2) \cup \left\{ \chi\_3 \right\} \times \mathfrak{m}(\mathrm{Xq}\_3) \tag{49}
$$

$$
= \left\{ \left\{ \mathcal{O} \right\} \times \left\{ 0001, 0010, 0100, 1000, 1101, 1110 \right\} \right\} \tag{40}
$$

$$
\times \left\{ 0000, 0001, 0010, 0011, 0100, 0101, 0110, 1000, 1001, 1100 \right\}
$$

$$
\cup \left\{ \left\{ \mathcal{O} \right\} \times \left\{ 0000, 0001, 0010, 0100, 1000, 1100, 1101, 1110 \right\} \right\}
$$

$$
\times \left\{ 0100, 0101, 1000, 1001, 1010, 1100, 1101, 1110 \right\}
$$

$$
\cup \left\{ \left\{ \mathcal{O} \right\} \times \left\{ 0001, 0010, 0011, 0100, 1001, 1000, 1101, 1110, 1111 \right\} \right\}
$$

$$
\times \left\{ 0011, 0110, 0111, 1010, 1011, 1111 \right\} \text{ .$$

For each *χ<sup>i</sup>* ∈ dom(*π*) the numerical values of the membership degrees of *χ<sup>i</sup>* in *π*, equal to their numerical quality grades, are given by the formula (cf. Equation 12):

$$\mathfrak{m}(\chi\_{i\prime}\pi) = \sum\_{p \in \pi[\chi\_{i}]} \frac{1}{2^{|p|}} \quad \text{ \tag{50}$$

and they are equal 0.62, 0.5 and 0.38, for the characters *c*1, *c*2, *c*<sup>3</sup> respectively (the reader is encouraged to verify it). The numerical representation of the degree looses information concerning the particular cells taken into account. For instance, there are many combinations of cells for which the above formula gives the result of 0.5. The numerical value is more human-friendly, however.

#### **3.3 Evaluating similarity degrees**

Once the compound character pattern (CCP) is prepared we are ready to supply testing character samples (CTS) and evaluate their similarity degrees. The CTS is represented as a metaset in exactly the same manner as CCP elements, i.e., we use the same matrix *X* with the same mapping *m* of cells to some maximal finite antichain *A*.

The process of matching the input character sample represented by the *A*-sample metaset *τ* against the prepared compound character pattern represented by the *A*-pattern metaset *π* involves calculation of the membership degree of *τ* in *π* and the sequence of equality degrees of *τ* and potential elements *χ<sup>i</sup>* of *π*. The membership degree tells us to what measure the CTS resembles the character defined by the CCP and is represented by the set M(*τ*, *π*) (see Equation 8). The equality degrees play supplemental role and they show the similarity of *τ* and each pattern element separately, which – contrary to the CCP – are single characters. They are represented by the sets E(*τ*, *χi*) (see Equation 10). We apply here the Theorem 5 for determining the similarity degrees and also the Equations 12–15 for numerical evaluation of the degres.

Let us make calculations for the sample letter 'c' shown on the Fig. 6. The metaset *χ* representing the character is defined by the Equation (34). First, we establish the notation. The left hand sides of the following equations correspond to variables used in the Theorem 5 and in the right hand sides we use metasets defined in previous sections, *i* = 1, 2, 3.



*ρ* = *π* the CCP, see Fig. 7 and Equation 49 , (53)

$$S = m(X \circ \text{c}) \qquad \text{the CTS selected cells, see Equation 35 } \,\, \text{s} \tag{54}$$

From the equation 14 we obtain the numerical values of the equality degrees.

<sup>e</sup>(*χ*1, *<sup>χ</sup>*) = <sup>1</sup> <sup>−</sup> <sup>2</sup>

Character Recognition with Metasets 31

<sup>e</sup>(*χ*2, *<sup>χ</sup>*) = <sup>1</sup> <sup>−</sup> <sup>4</sup>

<sup>e</sup>(*χ*3, *<sup>χ</sup>*) = <sup>1</sup> <sup>−</sup> <sup>2</sup>

<sup>=</sup> *<sup>Q</sup>*<sup>1</sup> <sup>∩</sup> *<sup>m</sup>*(*Xq*1) <sup>∪</sup> *<sup>Q</sup>*<sup>2</sup> <sup>∩</sup> *<sup>m</sup>*(*Xq*2) <sup>∪</sup> *<sup>Q</sup>*<sup>3</sup> <sup>∩</sup> *<sup>m</sup>*(*Xq*3)

∪ ( <sup>4</sup> \ { 0000, 0011, 1100, 1111 })

∩ ( <sup>4</sup> \ { 0111, 1010, 1011, 1101, 1110, 1111 })

= *m*(*Xq*1) \ { 0011 } ∪ *m*(*Xq*2) \ { 1100 } ∪ *m*(*Xq*3)

The interesting question that arises is what are the similarity degrees of each pattern element *χ<sup>i</sup>* to the pattern *π* itself? It turns out that the pattern samples do not have to be of the best

We show that the membership sets M(*χi*, *π*) are proper subsets of <sup>4</sup> and therefore, the membership values m(*χi*, *π*) are less than 1 for all *i* = 1, 2, 3. We present the results of

∩ ( <sup>4</sup> \ { 0000, 0001, 0010, 0011, 0110, 0111, 1011, 1111 })

∩ ( <sup>4</sup> \ { 0000, 0001, 0010, 0100, 0101, 1000, 1001, 1100, 1101, 1110 })

m(*χ*, *π*) = 1 . (59)

E(*χi*, *χj*) = <sup>4</sup> \ D(*χi*, *χj*) , (60)

qualities *qi* of the samples *ci* when calculating the equality degrees.

value m(*χ*, *π*) of *χ* in *π*. We apply the equations 46–48.

<sup>M</sup>(*χ*, *<sup>π</sup>*) = *<sup>U</sup>* <sup>=</sup> *<sup>Q</sup>*<sup>1</sup> <sup>∩</sup> *<sup>R</sup>*<sup>1</sup> <sup>∪</sup> *<sup>Q</sup>*<sup>2</sup> <sup>∩</sup> *<sup>R</sup>*<sup>2</sup> <sup>∪</sup> *<sup>Q</sup>*<sup>3</sup> <sup>∩</sup> *<sup>R</sup>*<sup>3</sup>

= ( <sup>4</sup> \ { 0011, 1111 })

∪ ( <sup>4</sup> \ { 0101, 1001 })

This means, that the sample *χ* perfectly matches the pattern *π*.

where D(*χi*, *χj*) are the difference sets depicted on the Table 1.

quality in order to assure that other input samples result in perfect matches.

calculations only, leaving the details to the reader. Let us start with equality sets:

= <sup>4</sup> .

**3.4 Discussion of the results**

Clearly, by the Equation 12 and by the Lemma 1,

The results show, that the character on the Fig. 6 resembles the characters *c*<sup>1</sup> and *c*<sup>3</sup> on the Fig. 7 equally well, whereas the character *c*<sup>2</sup> a bit worse. Note, that we do not take into account the

Now we calculate the membership set M(*χ*, *π*) (*U* in terms of Theorem 5) and the membership

<sup>24</sup> <sup>=</sup> <sup>7</sup> 8

<sup>24</sup> <sup>=</sup> <sup>6</sup> 8

<sup>24</sup> <sup>=</sup> <sup>7</sup> 8 .


We start with calculating the sets *Q<sup>i</sup>* , Recall, that the antichain *A* is equal to the level <sup>4</sup> of the tree and the mapping *m* is shown of the Fig. 2.

<sup>E</sup>(*χ*1, *<sup>χ</sup>*) = *<sup>Q</sup>*<sup>1</sup> <sup>=</sup> *<sup>S</sup>* <sup>∩</sup> *<sup>P</sup>*<sup>1</sup> <sup>∪</sup> (*<sup>A</sup>* \ *<sup>S</sup>*) <sup>∩</sup> (*<sup>A</sup>* \ *<sup>P</sup>*1) = { 0001, 0010, 0011, 0100, 1000, 1101, 1110, 1111 } ∩ { 0001, 0010, 0100, 1000, 1101, 1110 } ∪ { 0000, 0101, 0110, 0111, 1001, 1010, 1011, 1100 } ∩ { 0000, 0011, 0101, 0110, 0111, 1001, 1010, 1011, 1100, 1111 } <sup>=</sup> *<sup>P</sup>*<sup>1</sup> <sup>∪</sup> (*<sup>A</sup>* \ *<sup>S</sup>*) = { 0001, 0010, 0100, 1000, 1101, 1110 } ∪ { 0000, 0101, 0110, 0111, 1001, 1010, 1011 } = <sup>4</sup> \ { 0011, 1111 } . <sup>E</sup>(*χ*2, *<sup>χ</sup>*) = *<sup>Q</sup>*<sup>2</sup> <sup>=</sup> *<sup>S</sup>* <sup>∩</sup> *<sup>P</sup>*<sup>2</sup> <sup>∪</sup> (*<sup>A</sup>* \ *<sup>S</sup>*) <sup>∩</sup> (*<sup>A</sup>* \ *<sup>P</sup>*2) = { 0001, 0010, 0011, 0100, 1000, 1101, 1110, 1111 } ∩ { 0000, 0001, 0010, 0100, 1000, 1100, 1101, 1110 } ∪ { 0000, 0101, 0110, 0111, 1001, 1010, 1011, 1100 } ∩ { 0011, 0101, 0110, 0111, 1001, 1010, 1011, 1111 } <sup>=</sup> *<sup>P</sup>*<sup>2</sup> \ { 0000, 1100 } <sup>∪</sup> (*<sup>A</sup>* \ *<sup>S</sup>*) \ { 0000, 1100 } = { 0001, 0010, 0100, 1000, 1101, 1110 } ∪ { 0101, 0110, 0111, 1001, 1010, 1011 } = <sup>4</sup> \ { 0000, 0011, 1100, 1111 } .

$$\begin{aligned} \mathrm{E}(\chi\_{3},\chi) &= \mathcal{Q}^{3} = \mathcal{S} \cap \mathcal{P}^{3} \cup (\mathcal{A} \mid \mathcal{S}) \cap (\mathcal{A} \mid \mathcal{P}^{3}) \\ &= \{0001, 0010, 0011, 0100, 1000, 1101, 1110, 1111\} \\ &\cap \{0001, 0010, 0011, 0100, 0101, 1000, 1001, 1101, 1110, 1111\} \\ &\cup \{0000, 0101, 0110, 0111, 1001, 1010, 1011, 1100\} \\ &\cap \{0000, 0110, 0111, 1010, 1011, 1100\} \\ &= \mathcal{S} \cup (\mathcal{A} \mid \mathcal{P}^{3}) \\ &= \{0001, 0010, 0011, 0100, 1000, 1101, 1110, 1111\} \\ &\cup \{0000, 0110, 0111, 1010, 1011, 1100\} \\ &= \mathsf{T}\_{4} \mid \{0101, 1001\} \end{aligned}$$

16 Will-be-set-by-IN-TECH

We start with calculating the sets *Q<sup>i</sup>*

tree and the mapping *m* is shown of the Fig. 2.

<sup>E</sup>(*χ*1, *<sup>χ</sup>*) = *<sup>Q</sup>*<sup>1</sup> <sup>=</sup> *<sup>S</sup>* <sup>∩</sup> *<sup>P</sup>*<sup>1</sup> <sup>∪</sup> (*<sup>A</sup>* \ *<sup>S</sup>*) <sup>∩</sup> (*<sup>A</sup>* \ *<sup>P</sup>*1)

<sup>=</sup> *<sup>P</sup>*<sup>1</sup> <sup>∪</sup> (*<sup>A</sup>* \ *<sup>S</sup>*)

<sup>E</sup>(*χ*3, *<sup>χ</sup>*) = *<sup>Q</sup>*<sup>3</sup> <sup>=</sup> *<sup>S</sup>* <sup>∩</sup> *<sup>P</sup>*<sup>3</sup> <sup>∪</sup> (*<sup>A</sup>* \ *<sup>S</sup>*) <sup>∩</sup> (*<sup>A</sup>* \ *<sup>P</sup>*3)

<sup>=</sup> *<sup>S</sup>* <sup>∪</sup> (*<sup>A</sup>* \ *<sup>P</sup>*3)

= <sup>4</sup> \ { 0101, 1001 } .

= <sup>4</sup> \ { 0011, 1111 } .

<sup>E</sup>(*χ*2, *<sup>χ</sup>*) = *<sup>Q</sup>*<sup>2</sup> <sup>=</sup> *<sup>S</sup>* <sup>∩</sup> *<sup>P</sup>*<sup>2</sup> <sup>∪</sup> (*<sup>A</sup>* \ *<sup>S</sup>*) <sup>∩</sup> (*<sup>A</sup>* \ *<sup>P</sup>*2)

*P<sup>i</sup>* = *m*(*Xci*) the selected cells of CCP elements, see Fig. 7 and Equations 39–41 , (55) *R<sup>i</sup>* = *m*(*Xqi*) the CCP quality marks, see Equations 46–48 , (56) *Q<sup>i</sup>* = E(*χ*, *χi*) the equality sets, see Equations 10 and 22–23 , (57) *U* = M(*χ*, *π*) the membership set, see Equations 8 and 31–32 . (58)

= { 0001, 0010, 0011, 0100, 1000, 1101, 1110, 1111 }

∪ { 0000, 0101, 0110, 0111, 1001, 1010, 1011 }

∪ { 0000, 0101, 0110, 0111, 1001, 1010, 1011, 1100 }

∩ { 0000, 0011, 0101, 0110, 0111, 1001, 1010, 1011, 1100, 1111 }

= { 0001, 0010, 0011, 0100, 1000, 1101, 1110, 1111 } ∩ { 0000, 0001, 0010, 0100, 1000, 1100, 1101, 1110 } ∪ { 0000, 0101, 0110, 0111, 1001, 1010, 1011, 1100 } ∩ { 0011, 0101, 0110, 0111, 1001, 1010, 1011, 1111 }

<sup>=</sup> *<sup>P</sup>*<sup>2</sup> \ { 0000, 1100 } <sup>∪</sup> (*<sup>A</sup>* \ *<sup>S</sup>*) \ { 0000, 1100 }

∩ { 0001, 0010, 0011, 0100, 0101, 1000, 1001, 1101, 1110, 1111 }

= { 0001, 0010, 0100, 1000, 1101, 1110 } ∪ { 0101, 0110, 0111, 1001, 1010, 1011 }

= { 0001, 0010, 0011, 0100, 1000, 1101, 1110, 1111 }

= { 0001, 0010, 0011, 0100, 1000, 1101, 1110, 1111 }

∩ { 0000, 0110, 0111, 1010, 1011, 1100 }

∪ { 0000, 0110, 0111, 1010, 1011, 1100 }

∪ { 0000, 0101, 0110, 0111, 1001, 1010, 1011, 1100 }

= <sup>4</sup> \ { 0000, 0011, 1100, 1111 } .

∩ { 0001, 0010, 0100, 1000, 1101, 1110 }

= { 0001, 0010, 0100, 1000, 1101, 1110 }

, Recall, that the antichain *A* is equal to the level <sup>4</sup> of the

From the equation 14 we obtain the numerical values of the equality degrees.

$$\begin{aligned} \mathbf{e}(\chi\_{1'}\chi) &= 1 - \frac{2}{2^4} = \frac{7}{8} \\\\ \mathbf{e}(\chi\_{2'}\chi) &= 1 - \frac{4}{2^4} = \frac{6}{8} \\\\ \mathbf{e}(\chi\_{3'}\chi) &= 1 - \frac{2}{2^4} = \frac{7}{8} \end{aligned}$$

The results show, that the character on the Fig. 6 resembles the characters *c*<sup>1</sup> and *c*<sup>3</sup> on the Fig. 7 equally well, whereas the character *c*<sup>2</sup> a bit worse. Note, that we do not take into account the qualities *qi* of the samples *ci* when calculating the equality degrees.

Now we calculate the membership set M(*χ*, *π*) (*U* in terms of Theorem 5) and the membership value m(*χ*, *π*) of *χ* in *π*. We apply the equations 46–48.

$$\begin{aligned} \mathsf{M}(\chi,\pi) &= \mathsf{U} = \mathsf{Q}^{1} \cap \mathsf{R}^{1} \cup \mathsf{Q}^{2} \cap \mathsf{R}^{2} \cup \mathsf{Q}^{3} \cap \mathsf{R}^{3} \\ &= \mathsf{Q}^{1} \cap (\mathsf{X}q\_{1}) \cup \mathsf{Q}^{2} \cap m(\mathrm{Xq}\_{2}) \cup \mathsf{Q}^{3} \cap m(\mathrm{Xq}\_{3}) \\ &= (\mathsf{T}\_{4} \ \{\,\,\|\, 0011, 1111\}) \\ &\cap (\mathsf{T}\_{4} \ \{\,\|\, 0111, 1010, 1011, 1101, 1110, 1111\}) \\ &\cup (\mathsf{T}\_{4} \ \{\,\|\, 0000, 0011, 1000, 1101\}) \\ &\cap (\mathsf{T}\_{4} \ \{\,\|\, 0000, 0001, 0010, 0110, 0110, 0111, 1011, 1111\}) \\ &\cup (\mathsf{T}\_{4} \ \{\,\|\, 0000, 0001, 0101\}) \\ &\cap (\mathsf{T}\_{4} \ \{\,\|\, 0000, 0001, 0100, 0101, 1000, 11001, 1100, 1101, 1110\}) \\ &= m(\mathrm{Xq}\_{1}) \ \{\,\|\, 0011\, \|\, \|\, 0m(\mathrm{Xq}\_{2})\| \, \{\, 1100\} \} \cup m(\mathrm{Xq}\_{3}) \\ &= \mathsf{T}\_{4} \ . \end{aligned}$$

Clearly, by the Equation 12 and by the Lemma 1,

$$\mathbf{m}(\chi, \pi) = 1 \; . \tag{59}$$

This means, that the sample *χ* perfectly matches the pattern *π*.

#### **3.4 Discussion of the results**

The interesting question that arises is what are the similarity degrees of each pattern element *χ<sup>i</sup>* to the pattern *π* itself? It turns out that the pattern samples do not have to be of the best quality in order to assure that other input samples result in perfect matches.

We show that the membership sets M(*χi*, *π*) are proper subsets of <sup>4</sup> and therefore, the membership values m(*χi*, *π*) are less than 1 for all *i* = 1, 2, 3. We present the results of calculations only, leaving the details to the reader. Let us start with equality sets:

$$\mathbb{E}(\chi\_{i\prime}\chi\_{j}) = \mathbb{T}\_{4} \nmid \mathrm{D}(\chi\_{i\prime}\chi\_{j}) \; , \; \mathrm{d} \; \tag{60}$$

where D(*χi*, *χj*) are the difference sets depicted on the Table 1.

When creating a CCP one should bear in mind the following rule. Each pixel of the matrix must be covered by the foreground or background of at least one sample in the pattern. By covering we understand that it is included in at least one quality area. The reader may confirm that this rule is preserved in our example. If there exist a cell which is contained in exclusion area of each sample, then reaching the similarity value of 1 is not possible for any sample.

Character Recognition with Metasets 33

We demonstrated the method for character recognition based on metasets. The core of the idea lies in representing character samples and character patterns directly as metasets, as well as interpreting the membership and the equality degrees of corresponding metasets as the similarity degrees of characters. Although the idea is quite simple and straightforward, it seems to work fine. The experiments carried out with the computer application1 implementing this model confirm that it adequately reflects human perception of similarity of characters. As we have seen, the mechanism requires some laborious calculations, however

So far, no comparisons with other techniques for character recognition have been made. It must be stressed that the presented method is not by itself competitive to commercial solutions yet. It is rather a sketch of an idea which – when applied in cooperation with other techniques used for data processing, like centering and sharpening of character images – may turn out to

The main goal of this chapter was to convince the reader, that the idea of metaset is applicable to solving problems related to processing of vague, imprecise data. And moreover, that modelling of real world using metasets is quite natural and simple. We showed that metaset

It should be clear, that the discussed method has much wider scope of applications than recognition of letters. Although we presented the version for monochromatic (binary) images, it is not difficult to generalize it to color (many-valued) ones. The next step in research on the subject will focus on determining the characteristics of graphical data for which this method

Kunen, K. (1980). *Set Theory, An Introduction to Independence Proofs*, number 102 in *Studies in Logic and Foundations of Mathematics*, North-Holland Publishing Company. Pawlak, Z. (1982). Rough sets, *International Journal of Computer and Information Sciences*

Starosta, B. (2009). Application of metasets to character recognition, *Proc. of 18th International*

Starosta, B. (2010). Representing intuitionistic fuzzy sets as metasets, *Developments in Fuzzy*

Starosta, B. & Kosi ´nski, W. (2009). *Views on Fuzzy Sets and Systems from Different Perspectives.*

*Symposium, ISMIS 2009*, Vol. 5722 of *Lecture Notes in Artificial Intelligence*, pp. 602–611.

*Sets, Intuitionistic Fuzzy Sets, Generalized Nets and Related Topics. Volume I: Foundations*,

*Philosophy and Logic, Criticisms and Applications*, Vol. 243 of *Studies in Fuzziness and*

membership correctly mimics similarity when characters are appropriately encoded.

Atanassov, K. T. (1986). Intuitionistic fuzzy sets, *Fuzzy Sets and Systems* 20: 87–96.

http://www.pjwstk.edu.pl/~barstar/Research/MSOCR/index.html

**4. Conclusions**

gives the best results.

11: 341–356.

pp. 185–208.

<sup>1</sup> The application is available as Java applet under the URL:

**5. References**

they are to be carried out by machines.

have some advantages over other solutions.


Table 1. Difference sets D(*χi*, *χj*) for compound pattern elements

The matrix on the Table 1 is symmetric since *χ<sup>i</sup>* ≈*<sup>p</sup> χ<sup>j</sup>* is equivalent to *χ<sup>j</sup>* ≈*<sup>p</sup> χi*. Empty sets on the diagonal confirm, that *χ<sup>i</sup>* ≈*<sup>p</sup> χi*, for each *p* ∈ 4. We conclude that the equality values are as depicted on the Table 2.


Table 2. Equality values e(*χi*, *χj*) for compound pattern elements

Based on the above sets we calculate the membership sets and the membership values, similarly as before.


Thus, the similarity values of the characters *c*1, *c*<sup>2</sup> and *c*<sup>3</sup> to the CCP built on top of them are 0.94, 0.88 and 0.88, respectively. None of them matches the pattern to the highest degree. Even though the membership values of *χ<sup>i</sup>* in *π* are less than 1, there exist samples which match the CCP to the highest possible degree, with the membership value equal 1. Besides the character on the Fig. 6, the are three more – shown on the Fig.8 – for which the similarity degree reaches the maximal value.


Fig. 8. Three remaining samples with the best similarity to the pattern represented by *χ*.

Note, that the samples on the Fig. 6 and Fig. 8 differ only in pixels 0011 and 1100, which in at least one of the CCP elements on Fig. 7 belong to the sample and in at least another one belong to the background – being not excluded by the quality area at the same time.

The character samples *ci* and their quality grades *qi* were intentionally chosen so that they do not match the CCP to the highest degree, in order to demonstrate interpolation capabilities of the new mechanism. In typical cases, one constructs the CCP based on the good samples, which reflect most characteristics of the modelled pattern.

When creating a CCP one should bear in mind the following rule. Each pixel of the matrix must be covered by the foreground or background of at least one sample in the pattern. By covering we understand that it is included in at least one quality area. The reader may confirm that this rule is preserved in our example. If there exist a cell which is contained in exclusion area of each sample, then reaching the similarity value of 1 is not possible for any sample.

#### **4. Conclusions**

18 Will-be-set-by-IN-TECH

*χ*<sup>1</sup> ∅ 0000, 1100 0011, 0101, 1001, 1111 *χ*<sup>2</sup> 0000, 1100 ∅ 0000, 0011, 0101, 1001, 1100, 1111

The matrix on the Table 1 is symmetric since *χ<sup>i</sup>* ≈*<sup>p</sup> χ<sup>j</sup>* is equivalent to *χ<sup>j</sup>* ≈*<sup>p</sup> χi*. Empty sets on the diagonal confirm, that *χ<sup>i</sup>* ≈*<sup>p</sup> χi*, for each *p* ∈ 4. We conclude that the equality values

> e(*χi*, *χj*) *χ*<sup>1</sup> *χ*<sup>2</sup> *χ*<sup>3</sup> *χ*<sup>1</sup> 1 0.88 0.75 *χ*<sup>2</sup> 0.88 1 0.62 *χ*<sup>3</sup> 0.75 0.62 1

Based on the above sets we calculate the membership sets and the membership values,

M(*χ*1, *π*) = <sup>4</sup> \ { 1111 } m(*χ*1, *π*) = 0.94 ,

M(*χ*2, *π*) = <sup>4</sup> \ { 0000, 1111 } m(*χ*2, *π*) = 0.88 ,

M(*χ*3, *π*) = <sup>4</sup> \ { 0101, 1001 } m(*χ*3, *π*) = 0.88 .

Thus, the similarity values of the characters *c*1, *c*<sup>2</sup> and *c*<sup>3</sup> to the CCP built on top of them are 0.94, 0.88 and 0.88, respectively. None of them matches the pattern to the highest degree. Even though the membership values of *χ<sup>i</sup>* in *π* are less than 1, there exist samples which match the CCP to the highest possible degree, with the membership value equal 1. Besides the character on the Fig. 6, the are three more – shown on the Fig.8 – for which the similarity degree reaches

0001 0010

0001 0010

1101 1110 1111

0100 1000

1100 1101 1110 1111

Note, that the samples on the Fig. 6 and Fig. 8 differ only in pixels 0011 and 1100, which in at least one of the CCP elements on Fig. 7 belong to the sample and in at least another one

The character samples *ci* and their quality grades *qi* were intentionally chosen so that they do not match the CCP to the highest degree, in order to demonstrate interpolation capabilities of the new mechanism. In typical cases, one constructs the CCP based on the good samples,

Fig. 8. Three remaining samples with the best similarity to the pattern represented by *χ*.

belong to the background – being not excluded by the quality area at the same time.

0100 1000

which reflect most characteristics of the modelled pattern.

D(*χi*, *χj*) *χ*<sup>1</sup> *χ*<sup>2</sup> *χ*<sup>3</sup>

*χ*<sup>3</sup> 0011, 0101, 1001, 1111 0000, 0011, 0101, 1001, 1100, 1111 ∅

Table 1. Difference sets D(*χi*, *χj*) for compound pattern elements

Table 2. Equality values e(*χi*, *χj*) for compound pattern elements

are as depicted on the Table 2.

similarly as before.

the maximal value.

0100 1000 0001 0010 0011

1100 1101 1110 1111

We demonstrated the method for character recognition based on metasets. The core of the idea lies in representing character samples and character patterns directly as metasets, as well as interpreting the membership and the equality degrees of corresponding metasets as the similarity degrees of characters. Although the idea is quite simple and straightforward, it seems to work fine. The experiments carried out with the computer application1 implementing this model confirm that it adequately reflects human perception of similarity of characters. As we have seen, the mechanism requires some laborious calculations, however they are to be carried out by machines.

So far, no comparisons with other techniques for character recognition have been made. It must be stressed that the presented method is not by itself competitive to commercial solutions yet. It is rather a sketch of an idea which – when applied in cooperation with other techniques used for data processing, like centering and sharpening of character images – may turn out to have some advantages over other solutions.

The main goal of this chapter was to convince the reader, that the idea of metaset is applicable to solving problems related to processing of vague, imprecise data. And moreover, that modelling of real world using metasets is quite natural and simple. We showed that metaset membership correctly mimics similarity when characters are appropriately encoded.

It should be clear, that the discussed method has much wider scope of applications than recognition of letters. Although we presented the version for monochromatic (binary) images, it is not difficult to generalize it to color (many-valued) ones. The next step in research on the subject will focus on determining the characteristics of graphical data for which this method gives the best results.

#### **5. References**

Atanassov, K. T. (1986). Intuitionistic fuzzy sets, *Fuzzy Sets and Systems* 20: 87–96.


<sup>1</sup> The application is available as Java applet under the URL: http://www.pjwstk.edu.pl/~barstar/Research/MSOCR/index.html

**3** 

*Morocco* 

**Recognition of Tifinaghe Characters** 

Rachid El Ayachi, Mohamed Fakir and Belaid Bouikhalene

Optical Character Recognition (OCR) is one of the most successful applications of automatic pattern recognition. The field of characters recognition is very important. Several studies have been conducted on Latin, Arabic and Chinese characters (Bozinovic and Shihari, 1989; Brown, 1983; Fakir and Sodeyama, 1993; Fakir, 2001; Chaudhuri and al, 2002; Blumenstein and al, 2002; Miyazaki and al, 1974; Mezghani and al, 2008; Lallican and al, 2000; Burr, 1982) and various commercial applications have been produced such as bank cheque processing, postal automation, documents. However, for Amazigh characters, called Tifinaghe, few studies have been published in the literature. Among these researches, we find (El ayachi and Fakir, 2009; Amrouch et al, 2009; Es saady, 2009; Fakir et al, 2009; El ayachi et al, 2010). Because, characters are sensitive to noise the main problem in this field how to extracts strokes. This may be solved by the selection of the useful features customarily defined in the automatic character recognition as two types: global and local features. The principle of global features is based on the transformation of the character matrix into a new domain to extract features. The selection of local features is based on geometrical and topological properties of the character, such as strokes direction, strokes density, strokes length and

Unlike Latin characters, Tifinaghe characters are formed by loops, lines and curves. This makes it difficult to describe a character in one parametric form. In this study, invariant moments, modified invariant moments and Walsh transform are used as features for the recognition of Tifinaghe characters. Fig.1 illustrates the block diagram of the proposed recognition system. Tifinaghe texts were transferred to the computer through an image

The process consists of three phases. After preliminary pre-processing of position normalization, noise reduction and skew correction), a text is segmented into lines and lines into what to be characters in the second phase. In the third phase features extraction methods are applied. In the last phase the recognition procedure is completed. In this phase a Multilayer Neural Network and Dynamic Programming Technique are used to classifier characters. These phases are described in the following sections, but before that a brief

explanation about the characteristics of Tifinaghe characters is given.

**1. Introduction** 

position, etc.

scanner.

**Using Dynamic Programming &** 

**Neural Network** 

*Sultan Moulay Slimane University/ Faculty of Sciences and Techniques* 

*Soft Computing*, Springer Verlag, chapter Meta Sets. Another Approach to Fuzziness, pp. 509–522.

Zadeh, L. A. (1965). Fuzzy sets, *Information and Control* 8: 338–353.
