**That IS-IN Isn't IS-A: A Further Analysis of Taxonomic Links in Conceptual Modelling**

Jari Palomäki and Hannu Kangassalo *University of Tampere Finland* 

## **1. Introduction**

Ronald J. Brachman, in his basic article: "What IS-A Is and Isn't: An Analysis of Taxonomic Links in Semantic Networks", (1983), has analysed and catalogued different interpretations of inheritance link, which is called "IS-A", and which is used in different kind of knowledge-representation systems. This IS-A link is seen by Brachman as a relation "between the representational objects," which forms a "taxonomic hierarchy, a tree or a lattice-like structures for categorizing classes of things in the world being represented", (ibid., 30). This very opening phrase in Brachman's article reveals, and which the further analysis of his article confirms as it is done in this Chapter, that he is considering the IS-A relation and the different interpretations given to it as an *extensional* relation. Accordingly, in this Chapter we are considering an *intensional* IS-IN relation which also forms a taxonomic hierarchy and a lattice-like structure. In addition, we can consider the hierarchy provided by an IS-IN relation as a semantic network as well. On the other hand, this IS-IN relation, unlike IS-A relation, is a conceptual relation between concepts, and it is basically intensional in its character.

The purpose of this Chapter is to maintain that the IS-IN relation is not equal to the IS-A relation; more specifically, that Brachman's analysis of an extensional IS-A relation did not include an intensional IS-IN relation. However, we are not maintaining that Brachman's analysis of IS-A relation is wrong, or that there are some flaws in it, but that the IS-IN relation requires a different analysis than the IS-A relation as is done, for example, by Brachman.

This Chapter is composed as follows. Firstly, we are considering the different meanings for the IS-A relation, and, especially, how they are analysed by Brachman in (1983), and to which, in turn, we shall further analyse. Secondly, we are turning our attention to that of the IS-IN relation. We start our analysis by considering what the different senses of "in" are, and to do this we are turning first to Aristotle's and then to Leibniz's account of it. After that, thirdly, we are proceeding towards the basic relations between terms, concepts, classes (or sets), and things in order to propose a more proper use of the IS-IN relation and its relation to the IS-A relation. Lastly, as kind of a conclusion, we are considering some advances and some difficulties related to the intensional versus extensional approaches to a conceptual modelling.

That IS-IN Isn't IS-A: A Further Analysis of Taxonomic Links in Conceptual Modelling 5

3. and 4. we used a term "concept" which Brachman didn't use. Instead, he seems to use a term "concept" synonymously with an expression "a structured description", which, according to us, they are not. In any case, what Brachman calls here a conceptual containment relation is not the conceptual containment relation as we shall use it, see

Brachman gives six different meanings for the IS-A relation connecting two generics, which

1. *A subset/superset*, for example, "a cat is a mammal", where "a cat" is a set of cats, "a mammal" is a set of mammals, and a set of cats is a subset of a set of mammals, and a set of mammals is a superset of a set of cats. Accordingly, the IS-A relation is a -

2. *A generalisation/specialization*, for example, "a cat is a mammal" means that "for all entities *x*, if *x* is a cat, then *x* is a mammal". Now we have two possibilities: The first is that we interpret "*x* is a cat" and "*x* is a mammal" as a predication by means of copula, and the relation between them is a formal implication, where the predicate "cat" is a specialization of the predicate "mammal", and the predicate "mammal" is a generalization of the predicate "cat". Thus we can say that the IS-A relation is a formal implication (*x*) (*P*(*x*) *Q*(*x*)). The second is that since we can interpret "*x* is a cat" and "*x* is a mammal" by mean of -relation, and then by means of a formal implication we

can define a -relation, from which we get that the IS-A relation is a -relation. 3. *An AKO*, meaning "a kind of", for example, "a cat is a mammal", where "a cat" is a kind of "mammal". As Brachman points out, (ibid.), AKO has much common with generalization, but it implies "kind" status for the terms of it connects, whereas generalization relates arbitrary predicates. That is, to be a kind is to have an essential property (or set of properties) that makes it the kind that it is. Hence, being "a cat" it is necessary to be "a mammal" as well. This leads us to the natural kind inferences: if

turned to the Aristotelian essentialism and to a quantified modal logic, in which the IS-A relation is interpreted as a necessary formal implication � (*x*) (*P*(*x*) *Q*(*x*)). However, it is to be noted, that there are two relations connected with the AKO relation. The first one is the relation between an essential property and the kind, and the second one is the relation between kinds. Brachman does not make this difference in his article, and he does not consider the second one. Provided there are such things as kinds, in our view they would be connected with the IS-IN relation, which we shall consider in

4. *A conceptual containment*, for example, and following Brachman, (ibid.), instead of reading "a cat is a mammal" as a simple generalization, it is to be read as "to be a cat is to be a mammal". This, according to him, is the IS-A of lambda-abstraction, wherein one predicate is used in defining another, (ibid.). Unfortunately, it is not clear what Brachman means by "the IS-A of lambda-abstraction, wherein one predicate is used in defining another". If it means that the predicates occurring in the *definiens* are among the predicates occurring in the *definiendum*, there are three possibilities to interpret it: The first one is by means of the IS-A relation as a -relation between predicates, *i.e.*, the

, then every *A* has

. Thus we are

anything of a kind *A* has an essential property

Section 4 below.

relation.

**2.2 Generic/generic relations** 

the Section 4 below.

we shall list and analyse as follows, (ibid.):

#### **2. The different meanings for the IS-A relation**

The idea of IS-A relation seems to follow from the English sentences such as "Socrates is a man" and "a cat is a mammal", which provides two basic forms of using the IS-A relation. That is, a *predication*, where an individual (Socrates) is said to have a predicate (a man), and that one predicate (a cat) is said to be a *subtype* of the other predicate (a mammal). This second form is commonly expressed by the universally quantified conditional as follows: "for all entities *x*, if *x* is a cat, then *x* is a mammal". However, this formalization of the second use of the IS-A relation reveals, that it combines two commonly used expressions using the IS-A relation. Firstly, in the expressions of the form "*x* is a cat" and "*x* is a mammal" the IS-A relation is used as a predication, and secondly, by means of the universal quantifier and implication, the IS-A relation is used not as a predication, but as a connection between two predicates.

Accordingly, we can divide the use of the IS-A relation in to two major subtypes: one relating an individual to a species, and the other relating two species. When analysing the different meanings for the IS-A relation Brachman uses this division by calling them *generic*/*individual* and *generic*/*generic* relations, (Brachman 1983, 32).

#### **2.1 Generic/individual relations**

Brachman gives four different meanings for the IS-A relation connecting an individual and a generic, which we shall list and analyse as follows, (ibid.):


We may notice in the above analysis of different meanings of the IS-A relations between individuals and generic given by Brachman, that three out of four of them we were able to interpret the IS-A relation by means of -relation. And, of course, the copula expressing a function-argument relation is possible to express by -relation. Moreover, in our analysis of 3. and 4. we used a term "concept" which Brachman didn't use. Instead, he seems to use a term "concept" synonymously with an expression "a structured description", which, according to us, they are not. In any case, what Brachman calls here a conceptual containment relation is not the conceptual containment relation as we shall use it, see Section 4 below.

#### **2.2 Generic/generic relations**

4 Advances in Knowledge Representation

The idea of IS-A relation seems to follow from the English sentences such as "Socrates is a man" and "a cat is a mammal", which provides two basic forms of using the IS-A relation. That is, a *predication*, where an individual (Socrates) is said to have a predicate (a man), and that one predicate (a cat) is said to be a *subtype* of the other predicate (a mammal). This second form is commonly expressed by the universally quantified conditional as follows: "for all entities *x*, if *x* is a cat, then *x* is a mammal". However, this formalization of the second use of the IS-A relation reveals, that it combines two commonly used expressions using the IS-A relation. Firstly, in the expressions of the form "*x* is a cat" and "*x* is a mammal" the IS-A relation is used as a predication, and secondly, by means of the universal quantifier and implication, the IS-A relation is used not as a predication, but as a connection

Accordingly, we can divide the use of the IS-A relation in to two major subtypes: one relating an individual to a species, and the other relating two species. When analysing the different meanings for the IS-A relation Brachman uses this division by calling them

Brachman gives four different meanings for the IS-A relation connecting an individual and a

1. *A set membership relation*, for example, "Socrates is a man", where "Socrates" is an individual and "a man" is a set, and Socrates is a member of a set of man. Accordingly,

2. *A predication*, for example, a predicate "man" is predicated to an individual "Socrates", and we may say that a predicate and an individual is combined by a copula expressing a kind of function-argument relation. Brachman does not mention a copula in his

3. *A conceptual containment relation*, for which Brachman gives the following example, "a king" and "the king of France", where the generic "king" is used to construct the individual description. In this view Brachman's explanation and example is confusing. Firstly, "France" is an individual, and we could say that the predicate "a king" is predicated to "France", when the IS-A relation is a copula. Secondly, we could say that the concept of "king" applies to "France" when the IS-A relation is an application relation. Thirdly, the phrase "the king of France" is a definite description, when we could say that the king of France is a definite member of the set of kings, *i.e.*, the IS-A

4. *An abstraction*, for example, when from the particular man "Socrates" we abstract the general predicate "a man". Hence we could say that "Socrates" falls under the concept of "man", *i.e.*, the IS-A is a falls under –relation, or we could say that "Socrates" is a

We may notice in the above analysis of different meanings of the IS-A relations between individuals and generic given by Brachman, that three out of four of them we were able to interpret the IS-A relation by means of -relation. And, of course, the copula expressing a function-argument relation is possible to express by -relation. Moreover, in our analysis of

*generic*/*individual* and *generic*/*generic* relations, (Brachman 1983, 32).

generic, which we shall list and analyse as follows, (ibid.):

article, but according to this view the IS-A is a copula.

member of the set of "man", *i.e.*, the IS-A is an -relation.

**2. The different meanings for the IS-A relation** 

between two predicates.

**2.1 Generic/individual relations** 

the IS-A is an -relation.

relation is a converse of -relation.

Brachman gives six different meanings for the IS-A relation connecting two generics, which we shall list and analyse as follows, (ibid.):


That IS-IN Isn't IS-A: A Further Analysis of Taxonomic Links in Conceptual Modelling 7

1. The sense in which a physical part is *in* a physical whole to which it belongs. For

4. The sense in which a genus is *in* any of its species, or more generally, any feature of a

i. That which has to do with the *part*-*whole* relation, (1) and (2). Either the relation between

ii. That which has to do with the *genus*-*species* relation, (3) and (4). Either *A* is the genus

iii. That which has to do with a *causal* relation, (5), (6), and (7). There are, according to Aristotle, four kinds of causes: material, formal, efficient, and final. Thus, *A* may be the formal cause (form), and *B* the matter; or *A* may be the efficient cause ("motive agent"), and *B* the effect; or, given *A*, some particular thing or event *B* is its final cause (*telos*). iv. That which has to do with a *spatial* relation, (8). This Aristotle recognizes as the "strictest sense of all". *A* is said to be *in B*, where *A* is one thing and *B* is another thing or a place. "Place", for Aristotle, is thought of as what is occupied by some body. A thing located in some body is also located in some place. Thus we may designate *A* as

What concerns us here is the second group II, *i.e.,* that which has to do with the *genus*-*species*  relation, and especially the sense of "in" in which a genus *is in* any of its species. What is most important, according to us, it is this place in Aristotle's text to which Leibniz refers, when he says that "Aristotle himself seems to have followed the way of ideas [*viam idealem*], for he says that animal is in man, namely a concept in a concept; for otherwise men would be among animals [*insint animalibus*], (Leibniz *after* 1690a, 120). In this sentence Leibniz points out the distinction between conceptual level and the level of individuals, which amounts also the set of individuals. This distinction is crucial, and our proposal for distinguishing the IS-IN relation from the IS-A relation is based on it. What follows, we shall

Although the IS-A relation seems to follow from the English sentences such as "Socrates is a man" and "a cat is a mammal", the word "is" is logically speaking intolerably ambiguous, and a great care is needed not to confound its various meanings. For example, we have (1) the sense, in which it asserts Being, as in "*A* is"; (2) the sense of identity, as in "Cicero is Tullius"; (3) the sense of equality, as in "the sum of 6 and 8 is 14"; (4) the sense of predication, as in "the sky is blue"; (5) the sense of definition, as in "the *power set* of *A* is the set of all subsets of *A*"; etc. There are also less common uses, as "to be good is to be happy", where a relation of assertions is meant, and which gives rise to a formal implication. All this

5. The sense in which form is *in* matter. For example, "health is *in* the hot and cold". 6. The sense in which events center *in* their primary motive agent. For example, "the

7. The sense in which the existence of a thing centers *in* its final cause, its end.

From this list of eight different senses of "in" it is possible to discern four groups:

a part to the whole or its converse, the relation of a whole to its part.

call the IS-IN relation an intensional containment relation between concepts.

and *B* the species, or *A* is the species and *B* is the genus.

example, as the finger is *in* the hand.

species is *in* the definition of the species.

affairs of Creece center *in* the king".

8. The sense in which a thing is *in* a place.

the contained and *B* as the container.

**4. Conceptual structures** 

2. The sense in which a whole is *in* the parts that makes it up.

3. The sense in which a species is *in* its genus, as "man" is *in* "animal".

predicate of "mammal" is among the predicate of "cat". The second one is that the IS-A is a =*df* -sign between *definiens* and *definiendum*, or, perhaps, that the IS-A is a lambdaabstraction of it, *i.e.*, *xy*(*x* =*df y*), although, of course, "a cat =*df* a mammal" is not a complete definition of a cat. The third possibility is that the IS-IN is a relation between *concepts*, *i.e.*, the concept of "mammal" is contained in the concept of "cat", see Section 4 below. – And it is argued in this Chapter that the IS-IN relation is not the IS-A relation.


In the above analysis of the different meanings of the IS-A relations between two generics given by Brachman, concerning the relations of the AKO, the conceptual containment, and the relation between *"*set and its characteristic type", we were not able to interpret them by using only the set theoretical terms. Since set theory is extensional *par excellence*, the reason for that failure lies simply in the fact that in their adequate analysis some intensional elements are present. However, the AKO relation is based on a philosophical, *i.e.*, ontological, view that there are such things as kinds, and thus we shall not take it as a proper candidate for *the* IS-A relation. On the other hand, in both the conceptual containment relation and the relation between "set and its characteristic type" there occur as their terms "concepts", which are basically intensional entities. Accordingly we shall propose that their adequate analysis requires an intensional IS-IN relation, which differs from the most commonly used kinds of IS-A relations, whose analysis can be made set theoretically. Thus, we shall turn to the IS-IN relation.

#### **3. The IS-IN relation**

The idea of the IS-IN relation is close the IS-A relation, but distinction we want to draw between them is, as we shall propose, that the IS-A relation is analysable by means of set theory whereas the IS-IN relation is an intensional relation between concepts.

To analyse the IS-IN relation we are to concentrate on the word "in", which has a complex variety of meanings. First we may note that "in" is some kind of relational expression. Thus, we can put the matter of relation in formal terms as follows,

*A* is *in B*.

Now we can consider what the different senses of "in" are, and what kinds of substitutions can we make for *A* and *B* that goes along with those different senses of "in". To do this we are to turn first to Aristotle, who discuss of the term "in" in his *Physics*, (210a, 15ff, 1930). He lists the following senses of "in" in which one thing is said to be "in" another:


6. *A set and its characteristic type*, for example, the set of all cats and the concept of "a cat". Then we could say that the IS-A is an extension relation between the concept and its extension, where an extension of a concept is a set of all those things falling under the concept in question. On the other hand, Brachman says also that it associates the characteristic function of a set with that set, (ibid.). That would mean that we have a

Cat defined for elements *x X* by

In the above analysis of the different meanings of the IS-A relations between two generics given by Brachman, concerning the relations of the AKO, the conceptual containment, and the relation between *"*set and its characteristic type", we were not able to interpret them by using only the set theoretical terms. Since set theory is extensional *par excellence*, the reason for that failure lies simply in the fact that in their adequate analysis some intensional elements are present. However, the AKO relation is based on a philosophical, *i.e.*, ontological, view that there are such things as kinds, and thus we shall not take it as a proper candidate for *the* IS-A relation. On the other hand, in both the conceptual containment relation and the relation between "set and its characteristic type" there occur as their terms "concepts", which are basically intensional entities. Accordingly we shall propose that their adequate analysis requires an intensional IS-IN relation, which differs from the most commonly used kinds of IS-A relations, whose analysis can be made set

The idea of the IS-IN relation is close the IS-A relation, but distinction we want to draw between them is, as we shall propose, that the IS-A relation is analysable by means of set

To analyse the IS-IN relation we are to concentrate on the word "in", which has a complex variety of meanings. First we may note that "in" is some kind of relational expression. Thus,

*A* is *in B*. Now we can consider what the different senses of "in" are, and what kinds of substitutions can we make for *A* and *B* that goes along with those different senses of "in". To do this we are to turn first to Aristotle, who discuss of the term "in" in his *Physics*, (210a, 15ff, 1930). He

theory whereas the IS-IN relation is an intensional relation between concepts.

lists the following senses of "in" in which one thing is said to be "in" another:

Cat(*x*) = 0, if *x* Cat, where Cat is a set of cats, *i.e.*, Cat = {*x Cat*(*x*)}, where *Cat*(*x*) is a

Cat and the set Cat.

Cat: *X* {0, 1}, and, in particular, the IS-

Cat(*x*) = 1, if *x* Cat, and

bus" is a value being itself a certain type. Thus, the IS-A is a copula.

characteristic function

**3. The IS-IN relation** 

predicate of being a cat. Accordingly, Cat *X*,

theoretically. Thus, we shall turn to the IS-IN relation.

we can put the matter of relation in formal terms as follows,

A is a relation between the characteristic function

predicate of "mammal" is among the predicate of "cat". The second one is that the IS-A is a =*df* -sign between *definiens* and *definiendum*, or, perhaps, that the IS-A is a lambdaabstraction of it, *i.e.*, *xy*(*x* =*df y*), although, of course, "a cat =*df* a mammal" is not a complete definition of a cat. The third possibility is that the IS-IN is a relation between *concepts*, *i.e.*, the concept of "mammal" is contained in the concept of "cat", see Section 4 below. – And it is argued in this Chapter that the IS-IN relation is not the IS-A relation. 5. *A role value restriction*, for example, "the car is a bus", where "the car" is a role and "a

From this list of eight different senses of "in" it is possible to discern four groups:


What concerns us here is the second group II, *i.e.,* that which has to do with the *genus*-*species*  relation, and especially the sense of "in" in which a genus *is in* any of its species. What is most important, according to us, it is this place in Aristotle's text to which Leibniz refers, when he says that "Aristotle himself seems to have followed the way of ideas [*viam idealem*], for he says that animal is in man, namely a concept in a concept; for otherwise men would be among animals [*insint animalibus*], (Leibniz *after* 1690a, 120). In this sentence Leibniz points out the distinction between conceptual level and the level of individuals, which amounts also the set of individuals. This distinction is crucial, and our proposal for distinguishing the IS-IN relation from the IS-A relation is based on it. What follows, we shall call the IS-IN relation an intensional containment relation between concepts.

#### **4. Conceptual structures**

Although the IS-A relation seems to follow from the English sentences such as "Socrates is a man" and "a cat is a mammal", the word "is" is logically speaking intolerably ambiguous, and a great care is needed not to confound its various meanings. For example, we have (1) the sense, in which it asserts Being, as in "*A* is"; (2) the sense of identity, as in "Cicero is Tullius"; (3) the sense of equality, as in "the sum of 6 and 8 is 14"; (4) the sense of predication, as in "the sky is blue"; (5) the sense of definition, as in "the *power set* of *A* is the set of all subsets of *A*"; etc. There are also less common uses, as "to be good is to be happy", where a relation of assertions is meant, and which gives rise to a formal implication. All this

That IS-IN Isn't IS-A: A Further Analysis of Taxonomic Links in Conceptual Modelling 9

Now, the relations between concepts enable us to make conceptual structures. The basic relation between concepts is an intensional containment relation, (see Kauppi 1967, Kangassalo 1992/93, Palomäki 1994), and it is this intensional containment relation between

More formally, let there be given two concepts *a* and *b*. When a concept *a* contains intensionally a concept *b*, we may say that the intension of a concept *a* contains the intension of a concept *b*, or that the concept *a* intensionally entails the concept *b*, or that the intension of the concept *a* entails the intension of the concept *b*. This intensional containment relation

that is, that the transition from intensions to extensions reverses the containment relation, *i.e.*, the intensional containment relation between concepts *a* and *b* is converse to the

where "" is the set-theoretical subset-relation, or the extensional inclusion relation between

For example, if the concept of a dog contains intensionally the concept of a quadruped, then the extension of the concept of the quadruped, *i.e.*, the set of four-footed animals, contains extensionally as a subset the extension of the concept of the dog, *i.e.*, the set of dogs. Observe, though, that we can deduce from concepts to their extensions, *i.e.*, sets, but not conversely, because for every set there may be many different concepts, whose extension that set is.

The above formula (6) is what was searched, without success, by Woods in (1991), where the intensional containment relation is called by him a structural, or an intensional subsumption

Based on the intensional containment relation between concepts the late Professor Raili Kauppi has presented her axiomatic intensional concept theory in Kauppi (1967), which is further studied in (Palomäki 1994). This axiomatic concept theory was inspired by Leibniz's

1In the set theory a subset-relation between sets *A* and *B* is defined by -relation between the elements of them as follows , *A B* =df *x* (*x A x B*). Unfortunately both -relation and -relation are called IS-A relations, although they are different relations. On the other hand, we can take the

intensional containment relation between concepts *a* and *b*, *i.e*., *a* ≥ *b*, to be the IS-IN relation.

extensional set-theoretical subset-relation between their extensions. Thus, by (3),

*a b* . (4)

*a b i iFa iFb* ( ) ( ), (5)

*ab E a E b U U* ' ', (6)

*ab A B* 1 (7)

**4.2 An intensional containment relation** 

is denoted as follows,

relation.

**4.3 An intensional concept theory** 

concepts, which we are calling the IS-IN relation.

Then, it was observed by Kauppi in (1967) that

sets. Or, if we put *A* = *EU*'(*a*) and *B* = *EU*'(*b*), we will get,

shows that the natural language is not precise enough to make clear the different meanings of the word "is", and hence of the words "is a", and "is in". Accordingly, to make differences between the IS-A relation and the IS-IN relation clear, we are to turn our attention to a logic.

#### **4.1 Items connected to a concept**

There are some basic items connected to a concept, and one possible way to locate them is as follows, see Fig. 1, (Palomäki 1994):

#### Fig. 1. Items connected to a concept

A *term* is a linguistic entity. It *denotes* things and *connotes* a concept. A concept, in turn, has an *extension* and an *intension*. The extension of a concept is a *set*, (or a *class*, being more exact), of all those things that *falls under* the concept. Now, there may be many different terms which denote the same things but connote different concepts. That is, these different concepts have the same extension but they differ in their intension. By an intension of a concept we mean something which we have to "understand" or "grasp" in order to use the concept in question correctly. Hence, we may say that the intension of concept is that knowledge content of it which is required in order to recognize a thing belonging to the extension of the concept in question, (Kangassalo, 1992/93, 2007).

Let *U* = < *V*, *C*, *F* > be a universe of discourse, where i) *V* is a universe of (possible) individuals, ii) *C* is a universe of concepts, iii) *V C* { }, and iv) *F V C* is the falls under –relation. Now, if *a* is a concept, then for every (possible) individual *i* in *V*, either *i* falls under the concept *a* or it doesn't, *i.e*,

$$\text{if } a \in \mathbb{C}, \text{ then } \forall i \in V : \text{i}Fa \lor \sim \text{i}Fa. \tag{1}$$

The extension-relation *E* between the set *A* and the concept *a* in *V* is defined as follows:

$$E\_{\rm II} \left( A, a \right) \; =\_{\rm df} \left( \forall i \right) \left( i \in A \leftrightarrow i \in V \wedge iFa \right). \tag{2}$$

The extension of concept a may also be described as follows:

$$i \in E\_{\mathcal{U}} \; '(a) \leftrightarrow iFa\_{\prime} \tag{3}$$

where *EU*'(*a*) is the extension of concept *a* in *V*, *i.e.*, *EU*'(*a*) = { *i V* | *iFa*}.

#### **4.2 An intensional containment relation**

8 Advances in Knowledge Representation

shows that the natural language is not precise enough to make clear the different meanings of the word "is", and hence of the words "is a", and "is in". Accordingly, to make differences between the IS-A relation and the IS-IN relation clear, we are to turn our

There are some basic items connected to a concept, and one possible way to locate them is as

A *term* is a linguistic entity. It *denotes* things and *connotes* a concept. A concept, in turn, has an *extension* and an *intension*. The extension of a concept is a *set*, (or a *class*, being more exact), of all those things that *falls under* the concept. Now, there may be many different terms which denote the same things but connote different concepts. That is, these different concepts have the same extension but they differ in their intension. By an intension of a concept we mean something which we have to "understand" or "grasp" in order to use the concept in question correctly. Hence, we may say that the intension of concept is that knowledge content of it which is required in order to recognize a thing belonging to the

Let *U* = < *V*, *C*, *F* > be a universe of discourse, where i) *V* is a universe of (possible) individuals, ii) *C* is a universe of concepts, iii) *V C* { }, and iv) *F V C* is the falls under –relation. Now, if *a* is a concept, then for every (possible) individual *i* in *V*, either *i* falls

The extension-relation *E* between the set *A* and the concept *a* in *V* is defined as follows:

if , then : ~ . *a C i V iFa iFa* (1)

df *E A a i i A i V iFa <sup>U</sup>* , ( ) ( ). (2)

*i E a iFa <sup>U</sup>* ' , (3)

extension of the concept in question, (Kangassalo, 1992/93, 2007).

The extension of concept a may also be described as follows:

where *EU*'(*a*) is the extension of concept *a* in *V*, *i.e.*, *EU*'(*a*) = { *i V* | *iFa*}.

attention to a logic.

**4.1 Items connected to a concept** 

follows, see Fig. 1, (Palomäki 1994):

Fig. 1. Items connected to a concept

under the concept *a* or it doesn't, *i.e*,

Now, the relations between concepts enable us to make conceptual structures. The basic relation between concepts is an intensional containment relation, (see Kauppi 1967, Kangassalo 1992/93, Palomäki 1994), and it is this intensional containment relation between concepts, which we are calling the IS-IN relation.

More formally, let there be given two concepts *a* and *b*. When a concept *a* contains intensionally a concept *b*, we may say that the intension of a concept *a* contains the intension of a concept *b*, or that the concept *a* intensionally entails the concept *b*, or that the intension of the concept *a* entails the intension of the concept *b*. This intensional containment relation is denoted as follows,

$$a \ge b.\tag{4}$$

Then, it was observed by Kauppi in (1967) that

$$a \ge b \to (\forall i) \text{ (iFa} \to iFb)\_{\prime} \tag{5}$$

that is, that the transition from intensions to extensions reverses the containment relation, *i.e.*, the intensional containment relation between concepts *a* and *b* is converse to the extensional set-theoretical subset-relation between their extensions. Thus, by (3),

$$a \ge b \to E\_{\mathcal{U}} '(a) \subseteq E\_{\mathcal{U}} '(b) . \tag{6}$$

where "" is the set-theoretical subset-relation, or the extensional inclusion relation between sets. Or, if we put *A* = *EU*'(*a*) and *B* = *EU*'(*b*), we will get,

$$a \ge b \to A \subseteq B \text{ 1} \tag{7}$$

For example, if the concept of a dog contains intensionally the concept of a quadruped, then the extension of the concept of the quadruped, *i.e.*, the set of four-footed animals, contains extensionally as a subset the extension of the concept of the dog, *i.e.*, the set of dogs. Observe, though, that we can deduce from concepts to their extensions, *i.e.*, sets, but not conversely, because for every set there may be many different concepts, whose extension that set is.

The above formula (6) is what was searched, without success, by Woods in (1991), where the intensional containment relation is called by him a structural, or an intensional subsumption relation.

#### **4.3 An intensional concept theory**

Based on the intensional containment relation between concepts the late Professor Raili Kauppi has presented her axiomatic intensional concept theory in Kauppi (1967), which is further studied in (Palomäki 1994). This axiomatic concept theory was inspired by Leibniz's

<sup>1</sup>In the set theory a subset-relation between sets *A* and *B* is defined by -relation between the elements of them as follows , *A B* =df *x* (*x A x B*). Unfortunately both -relation and -relation are called IS-A relations, although they are different relations. On the other hand, we can take the intensional containment relation between concepts *a* and *b*, *i.e*., *a* ≥ *b*, to be the IS-IN relation.

That IS-IN Isn't IS-A: A Further Analysis of Taxonomic Links in Conceptual Modelling 11

The intensional identity is clearly a reflexive, symmetric and transitive relation, hence an

A concept *c* is called an *intensional product* of two concepts *a* and *b*, if any concept *x* is intensionally contained in *c* if and only if it is intensionally contained in both *a* and *b*. If two concepts *a* and *b* have an intensional product, it is unique up to the intensional identity and

Df df *cab xcx axbx* ( ) ( ).

The following axiom Ax of *KC* states that if two concepts *a* and *b* are comparable, there

Ax *ab x x a b* H ( ) ( ).

Df df *cab xxc xaxb* ( ) ( ) 4.

The following axiom Ax of *KC* states that if two concepts *a* and *b* are compatible, there

Ax *ab xxab* ( ) ( )

The intensional product of two concepts *a* and *b* is intensionally contained in their

Th 1 *abab* .

*Proof:* If *a b* exists, then by Df, *a a b* and *b a b*. Similarly, if *a b* exists, then by Df,

A concept *b* is an *intensional negation* of a concept *a*, denoted by ¬*a*, if and only if it is intensionally contained in all those concepts *x*, which are intensionally incompatible with

Df df *b a x x b xa* ( ) ( Y ) .

The following axiom Ax¬ of *KC* states that if there is a concept *x* which is incompatible with the concept *a*, there exists a concept *y*, which is the intensional negation of the concept *a*.

4Thus, *a b* [*a*] [*b*] is a greatest lower bound in *C*/≈, whereas *a b* [*a*] [*b*] is a least upper

It is easy to show that the intensional product is idempotent, commutative, and associative. A concept *c* is called an *intensional sum* of two concepts *a* and *b*, if the concept *c* is intensionally contained in any concept *x* if and only if it contains intensionally both *a* and *b*. If two concepts *a* and *b* have an intensional sum, it is unique up to the intensional identity

equivalence relation.

we denote it then by *a b*.

and we denote it then by *a b*.

bound in *C*/≈.

exists a concept *x* which is their intensional product.

exists a concept *x* which is their intensional sum.

intensional sum whenever both sides are defined.

The intensional sum is idempotent, commutative, and associative.

*a b a* and *a b b*. Hence, by AxTrans, the theorem follows.

the concept *a*. When ¬*a* exists, it is unique up to the intensional identity.

logic, where the intensional containment relation between concepts formalises an "*inesse*" relation2 in Leibniz's logic.3

An intensional concept theory, denoted by *KC*, is presented in a first-order language *L* that contains individual variables *a*, *b*, *c*,..., which range over the *concepts*, and one non-logical 2 place *intensional containment relation*, denoted by "". We shall first present four basic relations between concepts defined by "", and then, briefly, the basic axioms of the theory. A more complete presentation of the theory, see Kauppi (1967), and Palomäki (1994).

Two concepts *a* and *b* are said to be *comparable*, denoted by *a* H *b*, if there exists a concept *x* which is intensionally contained in both.

$$\text{Df}\_{\text{H}} \qquad \qquad a \text{H} b =\_{\text{df}} \text{ (}\exists \ge \text{x} \text{) } (a \ge \text{x} \land b \ge \text{x}) .$$

If two concepts *a* and *b* are not comparable, they are *incomparable*, which is denoted by *a* I *b*.

$$\text{Df}\_{\text{I}} \qquad \qquad a \text{I} b =\_{\text{df}} \sim a \text{H} b.$$

Dually, two concepts *a* and *b* are said to be *compatible*, denoted by *a b*, if there exists a concept *x* which contains intensionally both.

$$\text{Df}\_{\perp} \qquad \qquad a \perp b =\_{\text{df}} \left( \exists x \right) \left( x \ge a \land x \ge b \right)$$

If two concepts *a* and *b* are not compatible, they are *incompatible*, which is denoted by *a* Y *b*.

$$\text{Df}\_{\text{Y}} \qquad a \text{Y} b =\_{\text{df}} \sim a \perp b.$$

The two first axioms of *KC* state that the intensional containment relation is a *reflexive* and *transitive* relation.

$$\begin{aligned} \mathbf{A} \mathbf{x}\_{\text{Refl}} &\quad a \ge a. \\ \mathbf{A} \mathbf{x}\_{\text{Trans}} &\quad a \ge b \land b \ge c \to a \ge c. \end{aligned}$$

Two concepts *a* and *b* are said to be *intensionally identical*, denoted by *a* ≈ *b*, if the concept *a*  intensionally contains the concept *b*, and the concept *b* intensionally contains the concept *a*.

$$\text{Df}\_{\approx} \qquad a \approx b =\_{\text{df}} a \ge b \land b \ge a.$$

<sup>2</sup>Literally, "*inesse*" is "being-in", and this term was used by Scholastic translator of Aristotle to render the Greek "huparchei", *i.e.*, "belongs to", (Leibniz 1997, 18, 243).

<sup>3</sup>Cf. "*Definition 3*. That A 'is in' L, or, that L 'contains' A, is the same that L is assumed to be coincident with several terms taken together, among which is A", (Leibniz after 1690, 132). Also, e.g. in a letter to Arnauld 14 July 1786 Leibniz wrote, (Leibniz 1997, 62): "[I]n every affirmative true proposition, necessary or contingent, universal or singular, the notion of the predicate is contained in some way in that of the subject, *praedicatum inest subjecto* [the predicate is included in the subject]. Or else I do not know what truth is." This view may be called the conceptual containment theory of truth, (Adams 1994, 57), which is closely associated with Leibniz's preference for an "intensional" as opposed to an "extensional" interpretation of categorical propositions. Leibniz worked out a variety of both intensional and extensional treatments of the logic of predicates, *i*.*e*., concepts, but preferring the intensional approach, (Kauppi 1960, 220, 251, 252).

logic, where the intensional containment relation between concepts formalises an "*inesse*"-

An intensional concept theory, denoted by *KC*, is presented in a first-order language *L* that contains individual variables *a*, *b*, *c*,..., which range over the *concepts*, and one non-logical 2 place *intensional containment relation*, denoted by "". We shall first present four basic relations between concepts defined by "", and then, briefly, the basic axioms of the theory.

Two concepts *a* and *b* are said to be *comparable*, denoted by *a* H *b*, if there exists a concept *x*

Df H <sup>H</sup> df *ab x a x b x* ( ) ( ).

Df I <sup>I</sup> df *ab a b* ~ H.

Dually, two concepts *a* and *b* are said to be *compatible*, denoted by *a b*, if there exists a

Df df *ab xxaxb* ( ) ( )

DfY df *ab a b* Y ~ .

The two first axioms of *KC* state that the intensional containment relation is a *reflexive* and

Ax . *a a*

Two concepts *a* and *b* are said to be *intensionally identical*, denoted by *a* ≈ *b*, if the concept *a*  intensionally contains the concept *b*, and the concept *b* intensionally contains the concept *a*.

Df df *ab abba*.

2Literally, "*inesse*" is "being-in", and this term was used by Scholastic translator of Aristotle to render

3Cf. "*Definition 3*. That A 'is in' L, or, that L 'contains' A, is the same that L is assumed to be coincident with several terms taken together, among which is A", (Leibniz after 1690, 132). Also, e.g. in a letter to Arnauld 14 July 1786 Leibniz wrote, (Leibniz 1997, 62): "[I]n every affirmative true proposition, necessary or contingent, universal or singular, the notion of the predicate is contained in some way in that of the subject, *praedicatum inest subjecto* [the predicate is included in the subject]. Or else I do not know what truth is." This view may be called the conceptual containment theory of truth, (Adams 1994, 57), which is closely associated with Leibniz's preference for an "intensional" as opposed to an "extensional" interpretation of categorical propositions. Leibniz worked out a variety of both intensional and extensional treatments of the logic of predicates, *i*.*e*., concepts, but preferring the

*abbc ac*

Refl Trans Ax .

the Greek "huparchei", *i.e.*, "belongs to", (Leibniz 1997, 18, 243).

intensional approach, (Kauppi 1960, 220, 251, 252).

If two concepts *a* and *b* are not compatible, they are *incompatible*, which is denoted by *a* Y *b*.

If two concepts *a* and *b* are not comparable, they are *incomparable*, which is denoted by *a* I *b*.

A more complete presentation of the theory, see Kauppi (1967), and Palomäki (1994).

relation2 in Leibniz's logic.3

which is intensionally contained in both.

concept *x* which contains intensionally both.

*transitive* relation.

The intensional identity is clearly a reflexive, symmetric and transitive relation, hence an equivalence relation.

A concept *c* is called an *intensional product* of two concepts *a* and *b*, if any concept *x* is intensionally contained in *c* if and only if it is intensionally contained in both *a* and *b*. If two concepts *a* and *b* have an intensional product, it is unique up to the intensional identity and we denote it then by *a b*.

$$\text{Df}\_{\otimes} \qquad \mathsf{c} \approx a \otimes b =\_{\text{df}} \left( \forall \mathsf{x} \right) \left( \mathsf{c} \geq \mathsf{x} \leftrightarrow a \geq \mathsf{x} \wedge b \geq \mathsf{x} \right).$$

The following axiom Ax of *KC* states that if two concepts *a* and *b* are comparable, there exists a concept *x* which is their intensional product.

$$\mathbf{A}\mathbf{x}\_{\otimes} \qquad a\mathbf{H}b \to (\exists \mathfrak{x}) \ (\mathfrak{x} \approx a \otimes b).$$

It is easy to show that the intensional product is idempotent, commutative, and associative.

A concept *c* is called an *intensional sum* of two concepts *a* and *b*, if the concept *c* is intensionally contained in any concept *x* if and only if it contains intensionally both *a* and *b*. If two concepts *a* and *b* have an intensional sum, it is unique up to the intensional identity and we denote it then by *a b*.

$$\text{Df}\_{\oplus} \quad c \approx a \oplus b \\ =\_{\text{df}} \left( \forall \mathfrak{x} \right) \left( \mathfrak{x} \geq c \leftrightarrow \mathfrak{x} \geq a \wedge \mathfrak{x} \geq b \right) \left( \mathfrak{x} \right)$$

The following axiom Ax of *KC* states that if two concepts *a* and *b* are compatible, there exists a concept *x* which is their intensional sum.

$$\mathbf{A}\mathbf{x}\_{\oplus} \qquad a \perp b \to (\exists \mathbf{x}) \ (\mathbf{x} \approx a \oplus b)$$

The intensional sum is idempotent, commutative, and associative.

The intensional product of two concepts *a* and *b* is intensionally contained in their intensional sum whenever both sides are defined.

$$\text{Th}\,\mathbf{1} \qquad a \oplus b \geq a \otimes b.$$

*Proof:* If *a b* exists, then by Df, *a a b* and *b a b*. Similarly, if *a b* exists, then by Df, *a b a* and *a b b*. Hence, by AxTrans, the theorem follows.

A concept *b* is an *intensional negation* of a concept *a*, denoted by ¬*a*, if and only if it is intensionally contained in all those concepts *x*, which are intensionally incompatible with the concept *a*. When ¬*a* exists, it is unique up to the intensional identity.

$$\text{Df}\_{\neg} \quad \quad b \approx \ \neg a =\_{\text{df}} \left( \forall \mathfrak{x} \right) \left( \mathfrak{x} \ge b \leftrightarrow \mathfrak{x} \,\!\!\!\!\!\!\!\!\!\!\right) \,\,.$$

The following axiom Ax¬ of *KC* states that if there is a concept *x* which is incompatible with the concept *a*, there exists a concept *y*, which is the intensional negation of the concept *a*.

<sup>4</sup>Thus, *a b* [*a*] [*b*] is a greatest lower bound in *C*/≈, whereas *a b* [*a*] [*b*] is a least upper bound in *C*/≈.

That IS-IN Isn't IS-A: A Further Analysis of Taxonomic Links in Conceptual Modelling 13

Ax ( )( ) ( ) *<sup>G</sup> x y y x* .

Df df ( ) ( ) *<sup>S</sup> Sa x x a a x* .

Ax ( )( ) ( ) *<sup>S</sup> y x Sx x y* .

Th 4 ( ) ( *xS s s x s x* ). .

A special concept, which corresponds Leibniz's complete concept of an individual, would

By *Completeness Theorem*, every consistent first-order theory has a model. Accordingly, in Palomäki (1994, 94-97) a model of *KC* + Ax¬¬ is found to be a *complete semilattice*, where every concept *a C* defines a *Boolean algebra Ba* = <*a*,,,,*G*,*a*>, where *a* is an ideal, known as the *principal ideal generated by a, i.e. a* =df {*x C* | *a x*}, and the intensional

It should be emphasized that in *KC* concepts in generally don't form a lattice structure as, for example, they do in Formal Concept Analysis, (Ganter & Wille, 1998). Only in a very special case in *KC* concepts will form a lattice structure; that is, when all the concepts are both comparable and compatible, in which case there will be no incompatible concepts and,

In current literature, the relations between concepts are mostly based on the set theoretical relations between the extensions of concepts. For example, in Nebel & Smolka (1990), the conceptual intersection of the concepts of "man" and "woman" is the empty-concept, and their conceptual union is the concept of "adult". However, intensionally the common concept which contains both the concepts of "man" and of "woman", and so is their intensional conceptual intersection, is the concept of 'adult', not the empty-concept, and the concept in which they both are contained, and so is their intensional conceptual union, is the concept of "androgyne", not the concept of "adult". Moreover, if the extension of the empty-

6How this intensional concept theory *KC* is used in the context of conceptual modelling, *i.e*., when

contain one member of every pair of mutually incompatible concepts.

negation of a concept *b a* is interpreted as a *relative complement of a*.

developing a conceptual schemata, see especially (Kangassalo 1992/93, 2007).

hence, no intensional negation of a concept either.6

**5. That IS-IN Isn't IS-A** 

Since the special concept *s* is either compatible or incompatible with every concept, the *law of excluded middle* holds for *s* so that for any concept *x*, which has an intensional negation, either the concept *x* or its intensional negation *x* is intensionally contained in it. Hence, we have

The last axiom of *KC* states that there is for any concept *y* a special concept *x* in which it is

Adopting the axiom of the general concept it follows that all concepts are to be comparable. Since the general concept is compatible with every concept, it has no intensional negation. A *special concept* is a concept *a*, which is not intensionally contained in any other concept except for concepts intensionally identical to itself. Thus, there can be many special

concepts.

intensionally contained.

$$\text{Ax}\_{-} \quad \text{ (exists (a) (x\text{Y}a) \rightarrow (\exists y) (y \approx -a) \dotsb)}$$

It can be proved that a concept *a* contains intensionally its intensional double negation provided that it exists.

$$\text{Th } 2 \qquad a \ge \neg\neg a. ^\ast$$

*Proof:* By Df¬ the equivalence (1): *b* ¬*a b* Y *a* holds. By substituting ¬*a* for *b* to (1), we get ¬*a* ¬*a* ¬*a* Y *a*, and so, by AxRefl, we get (2): ¬*a* Y *a*. Then, by substituting *a* for *b* and ¬*a* for *a* to (1), we get *a* ¬¬*a a* Y ¬*a* and hence, by (2), the theorem follows.

Also, the following forms of the *De Morgan's formulas* can be proved whenever both sides are defined:

$$\begin{aligned} \text{Th } \mathfrak{Z} \quad \text{i)} \neg a \otimes \neg b &\geq \neg(a \oplus b), \\ \text{ii)} \neg(a \otimes b) &\approx \neg a \oplus \neg b. \end{aligned}$$

*Proof:* First we are to proof the following important lemma:

#### Lemma 1 *a b* ¬*b* ¬*a*.

Proof: From *a b* follows (*x*) (*x* Y *b x* Y *a*), and thus by Df¬ the Lemma 1 follows.


From 1 and 4, by Df≈, the Th 3 ii) follows.

If a concept *a* is intensionally contained in every concept *x*, the concept *a* is called a *general concept,* and it is denoted by *G*. The general concept is unique up to the intensional identity, and it is defined as follows:

$$\text{Df}\_{\text{G}} \qquad \qquad a \approx \text{G} =\_{\text{df}} \left( \forall \mathfrak{x} \right) \left( \mathfrak{x} \ge a \right).$$

The next axiom of *KC* states that there is a concept, which is intensionally contained in every concept.

<sup>5</sup>This relation does not hold conversely without stating a further axiom for intensional double negation, i.e., Ax¬¬: *b* Y ¬*a b a*. Thus, ¬¬*a a*, and hence by Th 2, *a ≈* ¬¬*a*, holds only, if the concept *a* is intensionally contained in the every concept *b*, which is incompatible with the intensional negation of the concept *a*.

Ax ( )( ) ( ) *<sup>G</sup> x y y x* .

Adopting the axiom of the general concept it follows that all concepts are to be comparable. Since the general concept is compatible with every concept, it has no intensional negation.

A *special concept* is a concept *a*, which is not intensionally contained in any other concept except for concepts intensionally identical to itself. Thus, there can be many special concepts.

$$\text{Df}\_{\text{S}} \qquad \quad \quad \quad \quad S(a) =\_{\text{df}} \left( \forall \text{x} \right) \left( \mathbf{x} \ge a \to a \ge \mathbf{x} \right) \dots$$

The last axiom of *KC* states that there is for any concept *y* a special concept *x* in which it is intensionally contained.

$$\mathbf{A}\mathbf{x}\_{\mathrm{S}} \qquad \text{(\forall y)}\\\text{(\exists x) (S(x) \land x \ge y)}\text{. }$$

Since the special concept *s* is either compatible or incompatible with every concept, the *law of excluded middle* holds for *s* so that for any concept *x*, which has an intensional negation, either the concept *x* or its intensional negation *x* is intensionally contained in it. Hence, we have

$$\text{Th } \mathbf{4} \qquad \text{(\forall x)}\\S(s) \to (s \ge x \lor s \ge -\infty) \dotsb$$

A special concept, which corresponds Leibniz's complete concept of an individual, would contain one member of every pair of mutually incompatible concepts.

By *Completeness Theorem*, every consistent first-order theory has a model. Accordingly, in Palomäki (1994, 94-97) a model of *KC* + Ax¬¬ is found to be a *complete semilattice*, where every concept *a C* defines a *Boolean algebra Ba* = <*a*,,,,*G*,*a*>, where *a* is an ideal, known as the *principal ideal generated by a, i.e. a* =df {*x C* | *a x*}, and the intensional negation of a concept *b a* is interpreted as a *relative complement of a*.

It should be emphasized that in *KC* concepts in generally don't form a lattice structure as, for example, they do in Formal Concept Analysis, (Ganter & Wille, 1998). Only in a very special case in *KC* concepts will form a lattice structure; that is, when all the concepts are both comparable and compatible, in which case there will be no incompatible concepts and, hence, no intensional negation of a concept either.6

#### **5. That IS-IN Isn't IS-A**

12 Advances in Knowledge Representation

Ax ( ) Y ( ) *x xa y y a* .

It can be proved that a concept *a* contains intensionally its intensional double negation

Th 2 *a a* . <sup>5</sup> *Proof:* By Df¬ the equivalence (1): *b* ¬*a b* Y *a* holds. By substituting ¬*a* for *b* to (1), we get ¬*a* ¬*a* ¬*a* Y *a*, and so, by AxRefl, we get (2): ¬*a* Y *a*. Then, by substituting *a* for *b* and ¬*a* for

Also, the following forms of the *De Morgan's formulas* can be proved whenever both sides

Th 3 i) ( ), ii) ( ) . *a b ab ab a b*

 

i. If *a b* exists, then by Df, *a b a* and *a b b*. By Lemma 1 we get ¬*a* ¬(*a b*) and

1. ¬(*a b*) ¬*a* ¬*b*. Since *a a b*, it follows by Lemma 1 that ¬(*a b*) ¬*a*. Thus, by

2. ¬(¬¬*a*  ¬¬*b*) ¬(*a b*). Since *a*  ¬¬*a*, by Th 2, it follows by Df that *a b* ¬¬*a*  ¬¬*b*.

3. (¬¬*a*  ¬¬*b*) ¬(¬*a* ¬*b*). Since (*a b*)  *a*, it follows by Lemma 1 that ¬*a* ¬(*a b*), and so, by Df, it follows (¬*a*  ¬*b*) ¬(*a b*). Thus, by substituting *a* for *a* and *b* for *b* to

4. ¬*a* ¬*b* ¬(*a b*). Since ¬*a* ¬*b* ¬¬(¬*a* ¬*b*), by Th 2, and from 3 it follows by Lemma 1 that ¬¬(¬*a* ¬*b*) ¬(¬¬*a*  ¬¬*b*), and by AxTrans we get, ¬*a* ¬*b* ¬(¬¬*a* 

If a concept *a* is intensionally contained in every concept *x*, the concept *a* is called a *general concept,* and it is denoted by *G*. The general concept is unique up to the intensional identity,

Df df ( ) ( ). *<sup>G</sup> aG xxa*

The next axiom of *KC* states that there is a concept, which is intensionally contained in every

5This relation does not hold conversely without stating a further axiom for intensional double negation, i.e., Ax¬¬: *b* Y ¬*a b a*. Thus, ¬¬*a a*, and hence by Th 2, *a ≈* ¬¬*a*, holds only, if the concept *a* is intensionally contained in the every concept *b*, which is incompatible with the intensional negation of

Proof: From *a b* follows (*x*) (*x* Y *b x* Y *a*), and thus by Df¬ the Lemma 1 follows.

*a* to (1), we get *a* ¬¬*a a* Y ¬*a* and hence, by (2), the theorem follows.

*Proof:* First we are to proof the following important lemma:

Lemma 1 *a b* ¬*b* ¬*a*.

¬*b* ¬(*a b*). Then, by Df, Th 3 i) follows. ii. This is proved in the four steps as follows:

¬¬*b*). Thus, by 2 and by AxTrans, 4 holds.

From 1 and 4, by Df≈, the Th 3 ii) follows.

provided that it exists.

are defined:

Df, 1 holds.

it, 3 holds.

and it is defined as follows:

concept.

the concept *a*.

Thus, by Lemma 1, 2 holds.

In current literature, the relations between concepts are mostly based on the set theoretical relations between the extensions of concepts. For example, in Nebel & Smolka (1990), the conceptual intersection of the concepts of "man" and "woman" is the empty-concept, and their conceptual union is the concept of "adult". However, intensionally the common concept which contains both the concepts of "man" and of "woman", and so is their intensional conceptual intersection, is the concept of 'adult', not the empty-concept, and the concept in which they both are contained, and so is their intensional conceptual union, is the concept of "androgyne", not the concept of "adult". Moreover, if the extension of the empty-

<sup>6</sup>How this intensional concept theory *KC* is used in the context of conceptual modelling, *i.e*., when developing a conceptual schemata, see especially (Kangassalo 1992/93, 2007).

That IS-IN Isn't IS-A: A Further Analysis of Taxonomic Links in Conceptual Modelling 15

 to denote the membership. It abbreviates the Greek word *έστί*, which means "is", and it asserts that *x is* blue. Now, the intensionality is implicitly present when we are selecting the members of a set by some definite property *P*(*x*), *i.e*., we have to understand the property of being *blue*, for instance, in order to select the possible members of the set of all blue things

An extensional view of concepts indeed is untenable. The fundamental property that makes extensions extensional is that concepts have the same extensions in case they have the same instances. Accordingly, if we use {*x a*(*x*)} and {*x b*(*x*)} to denote the extensions of the concepts *a* and *b*, respectively, we can express extensionality by means of the second-order

However, by accepting that principle some very implausible consequences will follow. For example, according to physiologists any creature with a heart also has a kidney, and *vice versa*. So the concepts of "heart" and "kidney" are co-extensional concepts, and then, by the principle (), the concepts of "heart" and "kidney" are 'identical' or interchangeable concepts. On the other hand, to distinguish between the concepts of "heart" and "kidney" is very relevant for instance in the case when someone has a heartattack, and the surgeon, who is a passionate extensionalist, prefers to operate his kidney

Intensional notions (e.g. concepts) are not strictly formal notions, and it would be misleading to take these as subjects of study for logic only, since logic is concerned with the forms of propositions as distinct from their contents. Perhaps only part of the theory of intensionality which can be called formal is pure modal logic and its possible worlds semantic. However, in concept theories based on possible worlds semantic, (see e.g. Hintikka 1969, Montague 1974, Palomäki 1997, Duzi et al. 2010), intensional notions are defined as (possibly partial, but indeed set-theoretical) functions from the possible worlds to

Also Nicola Guarino, in his key article on "ontology" in (1998), where he emphasized the intensional aspect of modelling, started to formalize his account of "ontology"8 by the possible world semantics in spite of being aware that the possible world approach has some disadvantages, for instance, the two concepts "trilateral" and "triangle" turn out to be the

8From Guarino's (1998) formalization of his view of "ontology", we will learn that the "ontology" for him is a set of axioms (language) such that its intended models approximate as well as possible the conceptualization of the world. He also emphasize that "it is important to stress that an ontology is *language-dependent*, while a conceptualization is *language-independent*." Here the word "conceptualization" means "a set of conceptual relations defined on a domain space", whereas by "the ontological commitments" he means the relation between the language and the conceptualization. This kind of language dependent view of "ontology" as well as other non-traditional use of the word

*a b xa b x ax x bx* ( ( ) ( | } { | }) . (\*)

(from the given Universe of Discourse).

principle,

instead of the heart.

extensions in those worlds.

**5.1 Intensionality in possible worlds semantic approach** 

same, as they have the same extension in all possible worlds.

"ontology" is analyzed and critized in Palomäki (2009).

concept is an empty set, then it would follow that the concepts of "androgyne", "centaur", and "round-square" are all equivalent with the empty-concept, which is absurd. Thus, although Nebel and Smolka are talking about concepts, they are dealing with them only in terms of extensional set theory, not intensional concept theory.

There are several reasons to separate intensional concept theory from extensional set theory, (Palomäki 1994). For instance: i) intensions determine extensions, but not conversely, ii) whether a thing belongs to a set is decided primarily by intension, iii) a concept can be used meaningfully even when there is not yet, nor ever will be, any individuals belonging to the extension of the concept in question, iv) there can be many non-identical but co-extensional concepts, v) extension of a concept may vary according to context, and vi) from Gödel's two Incompleteness Theorems it follows that intensions cannot be wholly eliminated from set theory.

One difference between extensionality and intensionality is that in extensionality a collection is determined by its elements, whereas in intensionality a collection is determined by a concept, a property, an attribute, etc. That means, for example, when we are creating a semantical network or a conceptual model by using an extensional IS-A relation as its taxonomical link, the existence of objects to be modeled are presupposed, whereas by using an intensional IS-IN relation between the concepts the existence of objects falling under those concepts are not presupposed. This difference is crucial when we are designing an object, which does not yet exist, but we have plenty of conceptual information about it, and we are building a conceptual model of it. In the set theoretical IS-A approach to a taxonomy the Universe of Discourse consists of individuals, whereas in the intensional concept theoretical IS-IN approach to a taxonomy the Universe of Discourse consists of concepts. Thus, in extensional approach we are moving from objects towards concepts, whereas in intensional approach we moving from concepts towards objects.

However, it seems that from strictly extensional approach we are not able to reach concepts without intensionality. The principle of extensionality in the set theory is given by a firstorder formula as follows,

$$
\forall A \forall B (\forall \mathfrak{x} (\mathfrak{x} \in A \leftrightarrow \mathfrak{x} \in B) \to A = B) \dots
$$

That is, if two *sets* have exactly the same members, then they are equal. Now, what is a set? - There are two ways to form a set: i) extensionally by listing all the elements of a set, for example, *A* = {*a*, *b*, *c*}, or ii) intensionally by giving the defining property *P*(*x*), in which the elements of a set is to satisfy in order to belong to the set, for example, *B* = {*x blue*(*x*)}, where the set *B* is the set of all blue things.7 Moreover, if we then write "*x B*", we use the symbol

<sup>7</sup>In pure mathematics there are only sets, and a "definite" property, which appears for example in the axiom schemata of separation and replacement in the Zermelo-Fraenkel set theory, is one that could be formulated as a first order theory whose atomic formulas were limited to set membership and identity. However, the set theory is of no practical use in itself, but is used to other things as well. We assume a theory *T*, and we shall call the objects in the domain of interpretation of *T individuals, (*or *atoms*, or *Urelements*). To include the individuals, we introduce a predicate *U*(*x*) to mean that *x* is an individual, and then we relativize all the axioms of *T* to *U*. That is, we replace every universal quantifier "*x*" in an axiom of *T* with "*x* (*U*(*x*) ...) and every existential quantifier "*x*" with "*x* (*U*(*x*) ...), and for every constant "*a*" in the language of *T* we add *U*(*a*) as new axiom.

concept is an empty set, then it would follow that the concepts of "androgyne", "centaur", and "round-square" are all equivalent with the empty-concept, which is absurd. Thus, although Nebel and Smolka are talking about concepts, they are dealing with them only in

There are several reasons to separate intensional concept theory from extensional set theory, (Palomäki 1994). For instance: i) intensions determine extensions, but not conversely, ii) whether a thing belongs to a set is decided primarily by intension, iii) a concept can be used meaningfully even when there is not yet, nor ever will be, any individuals belonging to the extension of the concept in question, iv) there can be many non-identical but co-extensional concepts, v) extension of a concept may vary according to context, and vi) from Gödel's two Incompleteness Theorems it follows that intensions cannot be wholly eliminated from set

One difference between extensionality and intensionality is that in extensionality a collection is determined by its elements, whereas in intensionality a collection is determined by a concept, a property, an attribute, etc. That means, for example, when we are creating a semantical network or a conceptual model by using an extensional IS-A relation as its taxonomical link, the existence of objects to be modeled are presupposed, whereas by using an intensional IS-IN relation between the concepts the existence of objects falling under those concepts are not presupposed. This difference is crucial when we are designing an object, which does not yet exist, but we have plenty of conceptual information about it, and we are building a conceptual model of it. In the set theoretical IS-A approach to a taxonomy the Universe of Discourse consists of individuals, whereas in the intensional concept theoretical IS-IN approach to a taxonomy the Universe of Discourse consists of concepts. Thus, in extensional approach we are moving from objects towards concepts, whereas in

However, it seems that from strictly extensional approach we are not able to reach concepts without intensionality. The principle of extensionality in the set theory is given by a first-

*A B xx A x B A B* (( ) ) .

That is, if two *sets* have exactly the same members, then they are equal. Now, what is a set? - There are two ways to form a set: i) extensionally by listing all the elements of a set, for example, *A* = {*a*, *b*, *c*}, or ii) intensionally by giving the defining property *P*(*x*), in which the elements of a set is to satisfy in order to belong to the set, for example, *B* = {*x blue*(*x*)}, where the set *B* is the set of all blue things.7 Moreover, if we then write "*x B*", we use the symbol

7In pure mathematics there are only sets, and a "definite" property, which appears for example in the axiom schemata of separation and replacement in the Zermelo-Fraenkel set theory, is one that could be formulated as a first order theory whose atomic formulas were limited to set membership and identity. However, the set theory is of no practical use in itself, but is used to other things as well. We assume a theory *T*, and we shall call the objects in the domain of interpretation of *T individuals, (*or *atoms*, or *Urelements*). To include the individuals, we introduce a predicate *U*(*x*) to mean that *x* is an individual, and then we relativize all the axioms of *T* to *U*. That is, we replace every universal quantifier "*x*" in an axiom of *T* with "*x* (*U*(*x*) ...) and every existential quantifier "*x*" with "*x* (*U*(*x*) ...), and for every

terms of extensional set theory, not intensional concept theory.

intensional approach we moving from concepts towards objects.

constant "*a*" in the language of *T* we add *U*(*a*) as new axiom.

theory.

order formula as follows,

 to denote the membership. It abbreviates the Greek word *έστί*, which means "is", and it asserts that *x is* blue. Now, the intensionality is implicitly present when we are selecting the members of a set by some definite property *P*(*x*), *i.e*., we have to understand the property of being *blue*, for instance, in order to select the possible members of the set of all blue things (from the given Universe of Discourse).

An extensional view of concepts indeed is untenable. The fundamental property that makes extensions extensional is that concepts have the same extensions in case they have the same instances. Accordingly, if we use {*x a*(*x*)} and {*x b*(*x*)} to denote the extensions of the concepts *a* and *b*, respectively, we can express extensionality by means of the second-order principle,

$$\forall a \forall b \left( \forall \mathbf{x} (a \equiv b) \leftrightarrow \left( \mathbf{x} \mid a \left( \mathbf{x} \right) \right) \right. \\ \left. \quad = \left\{ \mathbf{x} \mid b \left( \mathbf{x} \right) \right\} \right) . \tag{\*} \tag{\*}$$

However, by accepting that principle some very implausible consequences will follow. For example, according to physiologists any creature with a heart also has a kidney, and *vice versa*. So the concepts of "heart" and "kidney" are co-extensional concepts, and then, by the principle (), the concepts of "heart" and "kidney" are 'identical' or interchangeable concepts. On the other hand, to distinguish between the concepts of "heart" and "kidney" is very relevant for instance in the case when someone has a heartattack, and the surgeon, who is a passionate extensionalist, prefers to operate his kidney instead of the heart.

#### **5.1 Intensionality in possible worlds semantic approach**

Intensional notions (e.g. concepts) are not strictly formal notions, and it would be misleading to take these as subjects of study for logic only, since logic is concerned with the forms of propositions as distinct from their contents. Perhaps only part of the theory of intensionality which can be called formal is pure modal logic and its possible worlds semantic. However, in concept theories based on possible worlds semantic, (see e.g. Hintikka 1969, Montague 1974, Palomäki 1997, Duzi et al. 2010), intensional notions are defined as (possibly partial, but indeed set-theoretical) functions from the possible worlds to extensions in those worlds.

Also Nicola Guarino, in his key article on "ontology" in (1998), where he emphasized the intensional aspect of modelling, started to formalize his account of "ontology"8 by the possible world semantics in spite of being aware that the possible world approach has some disadvantages, for instance, the two concepts "trilateral" and "triangle" turn out to be the same, as they have the same extension in all possible worlds.

<sup>8</sup>From Guarino's (1998) formalization of his view of "ontology", we will learn that the "ontology" for him is a set of axioms (language) such that its intended models approximate as well as possible the conceptualization of the world. He also emphasize that "it is important to stress that an ontology is *language-dependent*, while a conceptualization is *language-independent*." Here the word "conceptualization" means "a set of conceptual relations defined on a domain space", whereas by "the ontological commitments" he means the relation between the language and the conceptualization. This kind of language dependent view of "ontology" as well as other non-traditional use of the word "ontology" is analyzed and critized in Palomäki (2009).

That IS-IN Isn't IS-A: A Further Analysis of Taxonomic Links in Conceptual Modelling 17

conceptualization is a private activity done by human mind. If the concepts exist transcendentally independently of both language and human cognition, then we have a problem of knowledge acquisition of them. Thus, the ontological question of the mode of existence of concepts is a deep philosophical issue. However, if we take an ontological commitment to a certain view of the mode of the existence of concepts, consequently we are making other ontological commitments as well. For example, realism on concepts is usually connected with realism of the world as well. In conceptualism we are more or less creating our world by conceptualization, and in nominalism there are neither intensionality nor

In the above analysis of the different senses of IS-A relation in the Section 2 we took our starting point Brachman's analysis of it in (Brachman 1983), and to which we gave a further analysis in order to show that most of those analysis IS-A relation is interpreted as an extensional relation, which we are able to give set theoretical interpretation. However, for some of Brachman's instances we were not able to give an appropriate set theoretical interpretation, and those were the instances concerning concepts. Accordingly, in the Section 3 we turned our analysis of IS-IN relation following Aristotelian-Leibnizian approach to it, and to which we were giving an intensional interpretation; that is, IS-IN relation is an intensional relation between concepts. A formal presentation of the basic relations between terms, concepts, classes (or sets), and things was given in the Section 4 as well as the basic axioms of the intensional concept theory *KC*. In the last Section 5 some of

the basic differences between the IS-IN relation and the IS-A relation was drawn.

So, in this Chapter we maintain that an IS-IN relation is not equal to an IS-A relation; more specifically, that Brachman's analysis of an *extensional* IS-A relation in his basic article: "What IS-A Is and Isn't: An Analysis of Taxonomic Links in Semantic Networks", (1983), did not include an *intensional* IS-IN relation. However, we are not maintain that Brachman's analysis of IS-A relation is wrong, or that there are some flaw in it, but that the IS-IN relation is different than the IS-A relation. Accordingly, we are proposing that the IS-IN relation is a conceptual relation between concepts and it is basically intensional relation, whereas the IS-

Provided that there are differences between intensional and extensional view when constructing hierarchical semantic networks, we are not allowed to identify concepts with their extensions. Moreover, in that case we are to distinguish the intensional IS-IN relation between concepts from the extensional IS-A relation between the extensions of concepts. However, only a thoroughgoing nominalist would identify concepts with their extensions,

Adams, R. M. (1994). *Leibniz: Determinist, Theist*, Idealist. New York, Oxford: Oxford

Aristotle, (1930). *Physics*. Trans. R. P. Hardie and R. K. Gaye, in *The Works of Aristotle*, Vol. 2,

abstract (or transcendental) entities like numbers.

A relation is to be reserved for extensional use only.

whereas for all the others this distinction is necessarily present.

ed. W. D. Ross. Oxford: Clarendon Press.

**6. Conclusion** 

**7. References** 

University Press.

In all these possible worlds approaches intensional notions are once more either reduced to extensional set-theoretic constructs in diversity of worlds or as being non-logical notions left unexplained. So, when developing an adequate presentation of a concept theory it has to take into account both formal (logic) and contentual (epistemic) aspects of concepts and their relationships.

#### **5.2 Nominalism, conceptualism, and conceptual realism (Platonism)**

In philosophy ontology is a part of metaphysics,9 which aims to answer at least the following three questions:


The first is (1) is perhaps the most difficult one, as it asks what elements the world is made up of, or rather, what are the building blocks from which the world is composed. A Traditional answer to this question is that the world consists of things and properties (and relations). An alternative answer can be found in Wittgenstein's Tractatus 1.1: "The world is the totality of facts, not of things", that is to say, the world consists of facts.

The second question (2) concerns the basic stuff from which the world is made. The world could be made out of one kind of stuff only, for example, water, as Thales suggests, or the world may be made out of two or more different kinds of stuff, for example, mind and matter.

The third question (3) concerns the mode of existence. Answers to this question could be the following ones, according to which something exists in the sense that:


The most crucial ontological question concerning concepts and intensionality is: "What modes of existence may concepts have?" The traditional answers to it are that


If the concepts exist only concretely as linguistic terms, then there are only extensional relationships between them. If the concepts exist abstractly as a cognitive capacity, then

<sup>9</sup>Nowadays there are two sense of the word "ontology": the traditional one, which we may call a philosophical view, and the more modern one used in the area of information systems, which we may call a knowledge representational view, (see Palomäki 2009).

conceptualization is a private activity done by human mind. If the concepts exist transcendentally independently of both language and human cognition, then we have a problem of knowledge acquisition of them. Thus, the ontological question of the mode of existence of concepts is a deep philosophical issue. However, if we take an ontological commitment to a certain view of the mode of the existence of concepts, consequently we are making other ontological commitments as well. For example, realism on concepts is usually connected with realism of the world as well. In conceptualism we are more or less creating our world by conceptualization, and in nominalism there are neither intensionality nor abstract (or transcendental) entities like numbers.

#### **6. Conclusion**

16 Advances in Knowledge Representation

In all these possible worlds approaches intensional notions are once more either reduced to extensional set-theoretic constructs in diversity of worlds or as being non-logical notions left unexplained. So, when developing an adequate presentation of a concept theory it has to take into account both formal (logic) and contentual (epistemic) aspects of concepts and

In philosophy ontology is a part of metaphysics,9 which aims to answer at least the

The first is (1) is perhaps the most difficult one, as it asks what elements the world is made up of, or rather, what are the building blocks from which the world is composed. A Traditional answer to this question is that the world consists of things and properties (and relations). An alternative answer can be found in Wittgenstein's Tractatus 1.1: "The world is

The second question (2) concerns the basic stuff from which the world is made. The world could be made out of one kind of stuff only, for example, water, as Thales suggests, or the world may be made out of two or more different kinds of stuff, for example, mind and

The third question (3) concerns the mode of existence. Answers to this question could be the

c. it has some kind of transcendental existence, in the sense that it extends beyond the

The most crucial ontological question concerning concepts and intensionality is: "What

i. concepts are merely predicate expressions of some language, i.e. they exist concretely,

ii. concepts exist in the sense that we have the socio-biological cognitive capacity to identify, classify, and characterize or perceive relationships between things in various

iii. concepts exist independently of both language and human cognition, i.e.

If the concepts exist only concretely as linguistic terms, then there are only extensional relationships between them. If the concepts exist abstractly as a cognitive capacity, then

9Nowadays there are two sense of the word "ontology": the traditional one, which we may call a philosophical view, and the more modern one used in the area of information systems, which we may

**5.2 Nominalism, conceptualism, and conceptual realism (Platonism)** 

the totality of facts, not of things", that is to say, the world consists of facts.

following ones, according to which something exists in the sense that:

modes of existence may concepts have?" The traditional answers to it are that

a. it has some kind of concrete space-time existence, b. it has some kind of abstract (mental) existence,

ways, i.e. they exist abstractly, (conceptualism);

call a knowledge representational view, (see Palomäki 2009).

transcendentally, (conceptual realism, Platonism).

their relationships.

1. What is there?

matter.

following three questions:

2. What is it, that there is? 3. How is that, that there is?

space-time existence.

(nominalism);

In the above analysis of the different senses of IS-A relation in the Section 2 we took our starting point Brachman's analysis of it in (Brachman 1983), and to which we gave a further analysis in order to show that most of those analysis IS-A relation is interpreted as an extensional relation, which we are able to give set theoretical interpretation. However, for some of Brachman's instances we were not able to give an appropriate set theoretical interpretation, and those were the instances concerning concepts. Accordingly, in the Section 3 we turned our analysis of IS-IN relation following Aristotelian-Leibnizian approach to it, and to which we were giving an intensional interpretation; that is, IS-IN relation is an intensional relation between concepts. A formal presentation of the basic relations between terms, concepts, classes (or sets), and things was given in the Section 4 as well as the basic axioms of the intensional concept theory *KC*. In the last Section 5 some of the basic differences between the IS-IN relation and the IS-A relation was drawn.

So, in this Chapter we maintain that an IS-IN relation is not equal to an IS-A relation; more specifically, that Brachman's analysis of an *extensional* IS-A relation in his basic article: "What IS-A Is and Isn't: An Analysis of Taxonomic Links in Semantic Networks", (1983), did not include an *intensional* IS-IN relation. However, we are not maintain that Brachman's analysis of IS-A relation is wrong, or that there are some flaw in it, but that the IS-IN relation is different than the IS-A relation. Accordingly, we are proposing that the IS-IN relation is a conceptual relation between concepts and it is basically intensional relation, whereas the IS-A relation is to be reserved for extensional use only.

Provided that there are differences between intensional and extensional view when constructing hierarchical semantic networks, we are not allowed to identify concepts with their extensions. Moreover, in that case we are to distinguish the intensional IS-IN relation between concepts from the extensional IS-A relation between the extensions of concepts. However, only a thoroughgoing nominalist would identify concepts with their extensions, whereas for all the others this distinction is necessarily present.

#### **7. References**


**1. Introduction**

RA<sup>+</sup>

**0**

**2**

*Slovenia*

**K-Relations and Beyond**

Melita Hajdinjak and Andrej Bauer

Although the theory of relational databases is highly developed and proves its usefulness in practice every day Garcia-Molina et al. (2008), there are situations where the relational model fails to offer adequate formal support. For instance, when querying *approximate data* Hjaltason & Brooks (2003); Minker (1998) or data within a given range of distance or *similarity* Hjaltason & Brooks (2003); Patella & Ciaccia (2009). Examples of such similarity-search applications are databases storing images, fingerprints, audio clips or time sequences, text databases with typographical or spelling errors, and text databases where we look for documents that are similar to a given document. A core component of such *cooperative* systems is a treatment of

At the heart of a cooperative database system is a database where the data domains come equipped with a *similarity relation*, to denote degrees of similarity rather than simply 'equal' and 'not equal'. This notion of similarity leads to an extension of the relational model where data can be annotated with, for instance, boolean formulas (as in incomplete databases) Calì et al. (2003); Van der Meyden (1998), membership degrees (as in fuzzy databases) Bordogna & Psaila (2006); Yazici & George (1999), event tables (as in probabilistic databases) Suciu (2008), timestamps (as in temporal databases) Jae & Elmasri (2001), sets of contributing tuples (as in the context of data warehouses and the computation of lineages or why-provenance) Cui et al. (2000); Green et al. (2007), or numbers representing the multiplicity of tuples (as in the context of bag semantics) Montagna & Sebastiani (2001). Querying such *annotated* or *tagged relations* involves the generalization of the classical relational algebra to perform

There have been many attempts to define extensions of the relational model to deal with similarity querying. Most utilize fuzzy logic Zadeh (1965), and the annotations are typically modelled by a membership function to the unit interval, [0, 1] Ma (2006); Penzo (2005); Rosado et al. (2006); Schmitt & Schulz (2004), although there are generalizations where the membership function instead maps to an algebraic structure of some kind (typically poset or lattice based) Belohlávek & V. Vychodil (2006); Peeva & Kyosev (2004); Shenoi & Melton (1989). Green et al. Green et al. (2007) proposed a general data model (referred to as the K*-relation model*) for annotated relations. In this model tuples in a relation are annotated with a value taken from a *commutative semiring*, K. The resulting positive relational algebra,

K, generalizes Codd's classic relational algebra Codd (1970), the bag algebra Montagna & Sebastiani (2001), the relational algebra on *c*-tables Imielinski & Lipski (1984), the probabilistic algebra on event tables Suciu (2008), and the provenance algebra Buneman et al. (2001); Cui et al. (2000). With relatively little work, the K-relation model is also suitable as a basis for

imprecise data Hajdinjak & Miheliˇc (2006); Minker (1998).

corresponding operations on the annotations (tags).

*University of Ljubljana*


## **K-Relations and Beyond**

Melita Hajdinjak and Andrej Bauer *University of Ljubljana Slovenia*

#### **1. Introduction**

18 Advances in Knowledge Representation

Brachman, R. J. (1983). What IS-A Is and Isn't: An Analysis of Taxonomic Links in Semantic

Duzi, M., Jespersen, B. & Materna, P. (2010). *Procedural Semantics for Hyperintenisonal Logic*.

Ganter, B. & Wille, R. (1998). *Formal Concept Analysis: Mathematical Foundations*, Berlin etc.:

Guarino, N. (1998). Formal Ontology in Information Systems. *Formal Ontology in Information* 

Kangassalo, H. (1992/93). COMIC: A system and methodology for conceptual modelling and information construction, *Data and Knowledge Engineering* 9, pp. 287-319. Kangassalo, H. (2007). Approaches to the Active Conceptual Modelling of Learning. *ACM-L* 

Kauppi, R. (1960). *Über die Leibnizsche Logic mit besonderer Berücksichtigung des Problems der* 

Kauppi, R. (1967). *Einführung in die Theorie der Begriffssysteme*. Acta Universitatis

Leibniz, G. W. (*after* 1690a). A Study Paper on 'some logical difficulties. In *Logical Papers: A Selection.* Trans. G. H. R. Parkinson. Oxford: Clarendon Press, 1966, pp. 115-121. Leibniz, G. W. (*after* 1690b). A Study in the Calculus of Real Addition. In *Logical Papers: A Selection.* Trans. G. H. R. Parkinson. Oxford: Clarendon Press, 1966, pp. 131-144. Leibniz, G. W. (1997). *Philosophical Writings*. Ed. G. H. R. Parkinson. Trans. M. Morris and G.

Montague, R. (1974). *Formal Philosophy*. Ed. R. Thomason. New Haven and London: Yale

Nebel, B. & Smolka, G. (1990). Representation and Reasoning with Attributive Descriptions.

Palomäki, J. (1994). *From Concepts to Concept Theory: Discoveries, Connections, and Results*. Acta Universitatis Tamperensis, Ser. A. Vol. 416. Tampere: University of Tampere. Palomäki, J. (1997). Three Kinds of Containment Relations of Concepts. In *Information* 

Palomäki, J. (2009). Ontology Revisited: Concepts, Languages, and the World(s). *Databases* 

Wittgenstein, L. (1921). *Tractatus Logico-Philosophicus*. Trans. by D. F. Pears and B. F.

Woods, W. A. (1991). Understanding Subsumption and Taxonomy: A Framework for

In *Sorts and Types in Artificial Intelligence*. Eds. Bläsius, K. H., Hedstück, U., and Rollinger, C. R. Lecture Notes in Computer Science 418. Berlin, etc.: Springer-

*Modelling and Knowledge Bases VIII.* Eds. H. Kangassalo, J.F. Nilsson, H. Jaakkola, and S. Ohsuga. Amsterdam, Berlin, Oxford, Tokyo, Washington, DC.: IOS Press, 261-277.

*and Information Systems V – Selected Papers from the Eighth International Baltic Conference, DB&IS 2008*. Eds. H.-M. Haav and A. Kalja. IOSPress: Amsterdam.

Progress. In *Principles of Semantic Networks – Explanations in the Representation of Knowledge*. Ed. J. Sowa. San Mateo, CA: Morgan Kaufmann Publishers, pp. 45-94.

Tamperensis, Ser. A. Vol. 15. Tampere: University of Tampere.

H. R. Parkinson. London: The Everyman Library.

Berlin, Tokyo, Washington D.C.: IOSPress, pp. 3-13.

McGuinness. Routledge and Kegan Paul: London, 1961.

*Systems. Proceedings of FOIS'98*. Ed. N. Guarino. Trento, Italy, 6-8 June 1998.

*2006.* LNCS 4512. Eds. P.P. Chen and L.Y. Wong. Berlin etc.: Springer-Verlag, pp.

*Intension und der Extension.* Acta Philosophica Fennica, Fasc. XII. Helsinki: Societas

Networks", *IEEE Computer* 16(10), pp. 30-36.

Hintikka, J. (1969). *Models for Modalities*. Dordrecht: D. Reidel.

Amsterdam, Washington, Tokyo: IOS Press, pp. 3-15.

Berlin etc.: Springer-Verlag.

Springer-Verlag.

168–193.

Philosophica Fennica.

University Press.

Verlag, pp. 112-139.

Although the theory of relational databases is highly developed and proves its usefulness in practice every day Garcia-Molina et al. (2008), there are situations where the relational model fails to offer adequate formal support. For instance, when querying *approximate data* Hjaltason & Brooks (2003); Minker (1998) or data within a given range of distance or *similarity* Hjaltason & Brooks (2003); Patella & Ciaccia (2009). Examples of such similarity-search applications are databases storing images, fingerprints, audio clips or time sequences, text databases with typographical or spelling errors, and text databases where we look for documents that are similar to a given document. A core component of such *cooperative* systems is a treatment of imprecise data Hajdinjak & Miheliˇc (2006); Minker (1998).

At the heart of a cooperative database system is a database where the data domains come equipped with a *similarity relation*, to denote degrees of similarity rather than simply 'equal' and 'not equal'. This notion of similarity leads to an extension of the relational model where data can be annotated with, for instance, boolean formulas (as in incomplete databases) Calì et al. (2003); Van der Meyden (1998), membership degrees (as in fuzzy databases) Bordogna & Psaila (2006); Yazici & George (1999), event tables (as in probabilistic databases) Suciu (2008), timestamps (as in temporal databases) Jae & Elmasri (2001), sets of contributing tuples (as in the context of data warehouses and the computation of lineages or why-provenance) Cui et al. (2000); Green et al. (2007), or numbers representing the multiplicity of tuples (as in the context of bag semantics) Montagna & Sebastiani (2001). Querying such *annotated* or *tagged relations* involves the generalization of the classical relational algebra to perform corresponding operations on the annotations (tags).

There have been many attempts to define extensions of the relational model to deal with similarity querying. Most utilize fuzzy logic Zadeh (1965), and the annotations are typically modelled by a membership function to the unit interval, [0, 1] Ma (2006); Penzo (2005); Rosado et al. (2006); Schmitt & Schulz (2004), although there are generalizations where the membership function instead maps to an algebraic structure of some kind (typically poset or lattice based) Belohlávek & V. Vychodil (2006); Peeva & Kyosev (2004); Shenoi & Melton (1989). Green et al. Green et al. (2007) proposed a general data model (referred to as the K*-relation model*) for annotated relations. In this model tuples in a relation are annotated with a value taken from a *commutative semiring*, K. The resulting positive relational algebra, RA<sup>+</sup> K, generalizes Codd's classic relational algebra Codd (1970), the bag algebra Montagna & Sebastiani (2001), the relational algebra on *c*-tables Imielinski & Lipski (1984), the probabilistic algebra on event tables Suciu (2008), and the provenance algebra Buneman et al. (2001); Cui et al. (2000). With relatively little work, the K-relation model is also suitable as a basis for

Poggi (2010); Green et al. (2007); Hajdinjak & Bierman (2011), we adopt the named-attribute

K-Relations and Beyond 21

Consider generalized relations in which the tuples are annotated (tagged) with information of various kinds. A notationally convenient way of working with annotated relations is to model tagging by a function on all possible tuples. Green et al. Green et al. (2007) argue that the generalization of the positive relational algebra to annotated relations requires that the set

is an algebraic structure with two binary operations (sum ⊕ and product �) and two distinguished elements (**<sup>0</sup>** �<sup>=</sup> **<sup>1</sup>**) such that (*K*, <sup>⊕</sup>, **<sup>0</sup>**) is a commutative monoid<sup>1</sup> with identity element 0, (*K*, �, **1**) is a monoid with identity element **1**, products distribute over sums, and **0** � *a* = *a* � **0** = **0** for any *a* ∈ *K* (i.e., **0** is an annihilating element). A semiring K is called

**Definition 2.1** (K-relation Green et al. (2007))**.** *Let* K = (*K*, ⊕, �, **0**, **1**) *be a commutative semiring. A* K*-relation over a schema U* = {*a*<sup>1</sup> : *τ*1,..., *an* : *τn*} *is a function A*: *U-Tup* → *K such that its*

Taking this extension of relations, Green et al. proposed a natural lifting of the classical relational operators over K-relations. The tuples considered to be 'in' the relation are tagged with **1** and the tuples considered to be 'out of' the relation are tagged with **0**. The binary operation ⊕ is used to deal with union and projection and therefore to combine different tags of the same tuple into one tag. The binary operation � is used to deal with natural join and

**Definition 2.2** (Positive relational algebra on K-relations Green et al. (2007))**.** *Suppose* K = (*K*, ⊕, �, **0**, **1**) *is a commutative semiring. The operations of the positive relational algebra on* K*,*

<sup>1</sup> A monoid consists of a set equipped with a binary operation that is associative and has an identity

is a finite map from *attribute names ai* to their types or domains

from attribute names *ai* to values *vi* of the corresponding domain, i.e.,

where *vi* ∈ *τ<sup>i</sup>* for *i* = 1, . . . , *n*. We denote the set of all *U*-tuples by *U*-Tup.

K

*U* = {*a*<sup>1</sup> : *τ*1,..., *an* : *τn*}, (1)

*t* = {*a*<sup>1</sup> : *v*1,..., *an* : *vn*} (3)

*U*(*ai*) = *τi*. (2)

*t*(*ai*) = *vi*, (4)

K = (*K*, ⊕, �, **0**, **1**) (5)

supp(*A*) = {*t* | *A*(*t*) �= **0**}, (6)

approach, so a *schema*,

We represent an *U-tuple* as a map

**2.1 Positive relational algebra RA**<sup>+</sup>

of tags is a *commutative semiring*.

commutative if monoid (*K*, �, **1**) is commutative.

therefore to combine the tags of joinable tuples.

K*, are defined as follows:*

Recall that a *semiring*

*support,*

*is finite.*

*denoted RA*<sup>+</sup>

element.

modelling data with similarities and simple, positive similarity queries Hajdinjak & Bierman (2011).

Geerts and Poggi Geerts & Poggi (2010) extended the positive relational algebra RA<sup>+</sup> K with a difference operator, which required restricting the class of commutative semirings to commutative semirings with *monus* or *m-semirings*. Because the monus-based difference operator yielded the wrong answer for two semirings important for similarity querying, a different approach to modelling negative queries in the K-relation model was proposed Hajdinjak & Bierman (2011). It required restricting the class of commutative semirings to commutative semirings with *negation* or *n-semirings*. In order to satisfy *all* of the classical relational identities (including the idempotence of union and self-join), Hajdinjak and Bierman Hajdinjak & Bierman (2011) made another restriction; for the annotation structure they chose *De Morgan frames*. In addition, since previous attempts to formalize similarity querying and the K-relation model all suffered from an expressivity problem allowing only one annotation structure per relation (every tuple is annotated with a value), the D*-relation* model was proposed in which every tuple is annotated with a tuple of values, one per attribute, rather than a single value.

Relying on the work on K, L-and D-relations, we make some further steps towards a general model of annotated relations. We come to the conclusion that complete distributive lattices with finite meets distributing over arbitrary joins may be chosen as a general annotation structure. This choice covers the classical relations Codd (1970), relations on bag semantics Green et al. (2007); Montagna & Sebastiani (2001) Fuhr-Rölleke-Zimányi probabilistic relations Suciu (2008), provenance relations Cui et al. (2000); Green et al. (2007), Imielinksi-Lipski relations on *c*-tables Imielinski & Lipski (1984), and fuzzy relations Hajdinjak & Bierman (2011); Rosado et al. (2006). We also aim to define a general framework of K, L-and D-relations in which all the previously considered kinds of annotated relations are modeled correctly. Our studies result in an attribute-annotated model of so called C-relations, in which some freedom of choice when defining the relational operations is given.

This chapter is organized as follows. In §2 we recall the definitions of K-relations and the positive relational algebra RA<sup>+</sup> K, along with RA<sup>+</sup> K(\), its extension to support negative queries. Section §3 recalls the definition of the tuple-annotated L-relation model, the aim of which was to include similarity relations into the K-relation framework of annotated relations. In §4 we present the attribute-annotated D-relation model, where every attribute is associated with its own annotation domain, and we study the properties of the resulting calculus of relations. In section §5 we explore whether there is a common domain of annotations suitable for all forms of annotated relations, and we define a general C-relation model. The final section §6 discusses the issue of ranking the annotated answers, and it gives some guidelines of future work.

## **2. The** K**-relation model**

In this section we recall the definitions of <sup>K</sup>-relations and the positive relational algebra RA<sup>+</sup> K, along with RA<sup>+</sup> K(\), its extension to support negative queries. The aim of the <sup>K</sup>-relation work was to provide a generalized framework capable of capturing various forms of annotated relations.

We first assume some base domains, or *types*, commonly written as *τ*, which are simply sets of ground values, such as integers and strings. Like the authors of previous work Geerts & K

K,

2 Will-be-set-by-IN-TECH

modelling data with similarities and simple, positive similarity queries Hajdinjak & Bierman

Geerts and Poggi Geerts & Poggi (2010) extended the positive relational algebra RA<sup>+</sup>

with a difference operator, which required restricting the class of commutative semirings to commutative semirings with *monus* or *m-semirings*. Because the monus-based difference operator yielded the wrong answer for two semirings important for similarity querying, a different approach to modelling negative queries in the K-relation model was proposed Hajdinjak & Bierman (2011). It required restricting the class of commutative semirings to commutative semirings with *negation* or *n-semirings*. In order to satisfy *all* of the classical relational identities (including the idempotence of union and self-join), Hajdinjak and Bierman Hajdinjak & Bierman (2011) made another restriction; for the annotation structure they chose *De Morgan frames*. In addition, since previous attempts to formalize similarity querying and the K-relation model all suffered from an expressivity problem allowing only one annotation structure per relation (every tuple is annotated with a value), the D*-relation* model was proposed in which every tuple is annotated with a tuple of values, one per

Relying on the work on K, L-and D-relations, we make some further steps towards a general model of annotated relations. We come to the conclusion that complete distributive lattices with finite meets distributing over arbitrary joins may be chosen as a general annotation structure. This choice covers the classical relations Codd (1970), relations on bag semantics Green et al. (2007); Montagna & Sebastiani (2001) Fuhr-Rölleke-Zimányi probabilistic relations Suciu (2008), provenance relations Cui et al. (2000); Green et al. (2007), Imielinksi-Lipski relations on *c*-tables Imielinski & Lipski (1984), and fuzzy relations Hajdinjak & Bierman (2011); Rosado et al. (2006). We also aim to define a general framework of K, L-and D-relations in which all the previously considered kinds of annotated relations are modeled correctly. Our studies result in an attribute-annotated model of so called C-relations, in which some freedom of choice when defining the relational operations is given. This chapter is organized as follows. In §2 we recall the definitions of K-relations and the

Section §3 recalls the definition of the tuple-annotated L-relation model, the aim of which was to include similarity relations into the K-relation framework of annotated relations. In §4 we present the attribute-annotated D-relation model, where every attribute is associated with its own annotation domain, and we study the properties of the resulting calculus of relations. In section §5 we explore whether there is a common domain of annotations suitable for all forms of annotated relations, and we define a general C-relation model. The final section §6 discusses the issue of ranking the annotated answers, and it gives some guidelines of future

In this section we recall the definitions of <sup>K</sup>-relations and the positive relational algebra RA<sup>+</sup>

was to provide a generalized framework capable of capturing various forms of annotated

We first assume some base domains, or *types*, commonly written as *τ*, which are simply sets of ground values, such as integers and strings. Like the authors of previous work Geerts &

K(\), its extension to support negative queries. The aim of the <sup>K</sup>-relation work

K(\), its extension to support negative queries.

K, along with RA<sup>+</sup>

(2011).

attribute, rather than a single value.

positive relational algebra RA<sup>+</sup>

**2. The** K**-relation model**

along with RA<sup>+</sup>

relations.

work.

Poggi (2010); Green et al. (2007); Hajdinjak & Bierman (2011), we adopt the named-attribute approach, so a *schema*,

$$\mathcal{U} = \{a\_1 \colon \tau\_1, \dots, a\_n \colon \tau\_n\},\tag{1}$$

is a finite map from *attribute names ai* to their types or domains

$$\mathcal{U}(a\_l) = \tau\_l. \tag{2}$$

We represent an *U-tuple* as a map

$$t = \{a\_1 \colon v\_1, \dots, a\_n \colon v\_n\} \tag{3}$$

from attribute names *ai* to values *vi* of the corresponding domain, i.e.,

$$t(a\_{\bar{l}}) = v\_{\bar{l}\nu} \tag{4}$$

where *vi* ∈ *τ<sup>i</sup>* for *i* = 1, . . . , *n*. We denote the set of all *U*-tuples by *U*-Tup.

#### **2.1 Positive relational algebra RA**<sup>+</sup> K

Consider generalized relations in which the tuples are annotated (tagged) with information of various kinds. A notationally convenient way of working with annotated relations is to model tagging by a function on all possible tuples. Green et al. Green et al. (2007) argue that the generalization of the positive relational algebra to annotated relations requires that the set of tags is a *commutative semiring*.

Recall that a *semiring*

$$\mathcal{K} = (\mathbb{K}, \oplus, \odot, \mathbf{0}, \mathbf{1}) \tag{5}$$

is an algebraic structure with two binary operations (sum ⊕ and product �) and two distinguished elements (**<sup>0</sup>** �<sup>=</sup> **<sup>1</sup>**) such that (*K*, <sup>⊕</sup>, **<sup>0</sup>**) is a commutative monoid<sup>1</sup> with identity element 0, (*K*, �, **1**) is a monoid with identity element **1**, products distribute over sums, and **0** � *a* = *a* � **0** = **0** for any *a* ∈ *K* (i.e., **0** is an annihilating element). A semiring K is called commutative if monoid (*K*, �, **1**) is commutative.

**Definition 2.1** (K-relation Green et al. (2007))**.** *Let* K = (*K*, ⊕, �, **0**, **1**) *be a commutative semiring. A* K*-relation over a schema U* = {*a*<sup>1</sup> : *τ*1,..., *an* : *τn*} *is a function A*: *U-Tup* → *K such that its support,*

$$\text{supp}(A) = \{ t \mid A(t) \neq \mathbf{0} \},\tag{6}$$

*is finite.*

Taking this extension of relations, Green et al. proposed a natural lifting of the classical relational operators over K-relations. The tuples considered to be 'in' the relation are tagged with **1** and the tuples considered to be 'out of' the relation are tagged with **0**. The binary operation ⊕ is used to deal with union and projection and therefore to combine different tags of the same tuple into one tag. The binary operation � is used to deal with natural join and therefore to combine the tags of joinable tuples.

**Definition 2.2** (Positive relational algebra on K-relations Green et al. (2007))**.** *Suppose* K = (*K*, ⊕, �, **0**, **1**) *is a commutative semiring. The operations of the positive relational algebra on* K*, denoted RA*<sup>+</sup> K*, are defined as follows:*

<sup>1</sup> A monoid consists of a set equipped with a binary operation that is associative and has an identity element.

*5. The provenance algebra of polynomials with variables from X and coefficients from* **N** *Cui et al. (2000); Green et al. (2007) is given by the* K*-relational algebra on the provenance semiring* K*prov* =

K-Relations and Beyond 23

**Proposition 2.1** (Identities of K-relations Green et al. (2007); Hajdinjak & Bierman (2011))**.**

*• join with an empty relation gives an empty relation, A* ∅*<sup>U</sup>* = ∅*<sup>U</sup> where A is a* K*-relation over*

It is important to note that the properties of idempotence of union, *A* ∪ *A* = *A*, and self-join, *A A* = *A*, are missing from this list. These properties fail for the bag semantics and

Green et al. only considered positive queries and left open the problem of supporting negative

Geerts and Poggi Geerts & Poggi (2010) recently proposed extending the K-relation model by a difference operator following a standard approach for introducing a monus operator into an additive commutative monoid Amer (1984). First, they restricted the class of commutative semirings by requiring that every semiring additionally satisfy the following

**Definition 2.3** (GP-conditions Geerts & Poggi (2010))**.** *A commutative semiring* K =

*2. For each pair of elements x*, *y* ∈ *K, the set* {*z* ∈ *K*; *x* � *y* ⊕ *z*} *has a smallest element. (As* �

**Definition 2.4** (*m*-semiring Geerts & Poggi (2010))**.** *Let* K = (*K*, ⊕, �, **0**, **1**) *be a commutative semiring that satisfies the GP conditions. For any x*, *y* ∈ *K, we define x* � *y to be the smallest element z such that x* � *y* ⊕ *z. A (commutative) semiring* K *that can be equipped with a* monus *operator* � *is*

<sup>3</sup> While a *preorder* is a binary relation that is reflexive and transitive, a *partial order* is a binary relation that

*x* � *y iff there exists a z* ∈ *K such that x* ⊕ *z* = *y* (13)

(*K*, ⊕, �, **0**, **1**) *is said to satisfy the* GP conditions *if the following two conditions hold.*

*defines a partial order, this smallest element must be unique, if it exists.)*

*• selection with boolean predicates gives all or nothing, σ*false(*A*) = ∅ *and σ*true(*A*) = *A;*

*The following identities hold for the positive relational algebra on* K*-relations:*

*• union is associative, commutative, and has identity* ∅*;*

*• selections and projections commute with each other;*

*• join is associative, commutative and distributive over union;*

*• projection of an empty relation gives an empty relation, πV*(∅) = ∅*.*

provenance, so they fail to hold for the more general model.

K(\)

*• selection distributes over union and product;*

*• projection distributes over union and join;*

<sup>K</sup> satisfies many of the familiar relational equalities Ullman

(**N**[*X*], +, ·, 0, 1)*.*

(1988; 1989).

*a schema U;*

query operators.

pair of conditions.

*is a* partial order*.*

**2.2 Relational algebra RA**<sup>+</sup>

*1. The preorder x* � *y on K defined as*

3

*called a* semiring with monus *or m*-semiring*.*

is refleksive, transitive, and antisymmetric.

The positive relational algebra RA<sup>+</sup>

**Empty relation:** *For any set of attributes U, there is* ∅*<sup>U</sup>* : *U-Tup* → *K such that*

$$\mathcal{Q}\_{\rm II}(t) \stackrel{\text{def}}{=} \mathbf{0} \tag{7}$$

*for all U-tuples t.*<sup>2</sup>

**Union:** *If A*, *B*: *U-Tup* → *K, then A* ∪ *B*: *U-Tup* → *K is defined by*

$$(A \cup B)(t) \stackrel{\text{def}}{=} A(t) \oplus B(t). \tag{8}$$

**Projection:** *If A*: *U-Tup* → *K and V* ⊂ *U , we write f* ↓ *V to be the restriction of the map f to the domain V. The projection π<sup>V</sup> A*: *V-Tup* → *K is defined by*

$$(\pi\_V A)(t) \stackrel{\text{def}}{=} \sum\_{\substack{(t' \downarrow V) = t \text{ and } A(t') \neq \mathbf{0}}} A(t'). \tag{9}$$

**Selection:** *If A*: *U-Tup* → *K and the selection predicate* **P** *maps each U-tuple to either* **0** *or* **1***, then σ***P***A* : *U-Tup* → *K is defined by*

$$(\sigma \mathbf{p} A)(t) \stackrel{\text{def}}{=} A(t) \odot \mathbf{P}(t). \tag{10}$$

**Join:** *If A*: *U*1*-Tup* → *K and B*: *U*2*-Tup* → *K, then A B is the* K*-relation over U*<sup>1</sup> ∪ *U*<sup>2</sup> *defined by*

$$(A \bowtie \mathcal{B})(t) \overset{\text{def}}{=} A(t \downarrow \mathcal{U}\_1) \odot \mathcal{B}(t \downarrow \mathcal{U}\_2). \tag{11}$$

**Renaming:** *If A*: *U-Tup* → *K and β* : *U* → *U*� *is a bijection, then ρβA*: *U*� *-Tup* → *K is defined by*

$$(\rho\_{\beta}A)(t) \stackrel{\text{def}}{=} A(t \circ \beta). \tag{12}$$

Note that in the case for projection, the sum is finite since *A* has finite support.

The power of this definition is that it generalizes a number of proposals for annotated relations and associated query algebras.

**Lemma 2.1** (Example algebras on K-relations Green et al. (2007))**.**


<sup>2</sup> As is standard, we drop the subscript on the empty relation where it can be inferred by context.

4 Will-be-set-by-IN-TECH

<sup>∅</sup>*U*(*t*) def

**Projection:** *If A*: *U-Tup* → *K and V* ⊂ *U , we write f* ↓ *V to be the restriction of the map f to the*

= ∑

**Selection:** *If A*: *U-Tup* → *K and the selection predicate* **P** *maps each U-tuple to either* **0** *or* **1***, then*

**Join:** *If A*: *U*1*-Tup* → *K and B*: *U*2*-Tup* → *K, then A B is the* K*-relation over U*<sup>1</sup> ∪ *U*<sup>2</sup> *defined*

(*t*�↓*V*)=*t and A*(*t*�)�=**0**

= **0** (7)

= *A*(*t*) ⊕ *B*(*t*). (8)

= *A*(*t*) � **P**(*t*). (10)

= *A*(*t* ↓ *U*1) � *B*(*t* ↓ *U*2). (11)

= *A*(*t* ◦ *β*). (12)

). (9)

*-Tup* → *K is defined by*

*A*(*t* �

**Empty relation:** *For any set of attributes U, there is* ∅*<sup>U</sup>* : *U-Tup* → *K such that*

(*<sup>A</sup>* <sup>∪</sup> *<sup>B</sup>*)(*t*) def

(*σ***P***A*)(*t*) def

(*ρβA*)(*t*) def

The power of this definition is that it generalizes a number of proposals for annotated relations

*1. The classical relational algebra with set semantics Codd (1970) is given by the* K*-relational algebra*

*2. The relational algebra with bag semantics Green et al. (2007); Montagna & Sebastiani (2001) is given by the* K*-relational algebra on the semiring of counting numbers* K**<sup>N</sup>** = (**N**, +, ·, 0, 1)*. 3. The Fuhr-Rölleke-Zimányi probabilistic relational algebra on event tables Suciu (2008) is given by the* K*-relational algebra on the semiring* K*prob* = (P(Ω), ∪, ∩, ∅, Ω) *where* Ω *is a finite set of*

*4. The Imielinksi-Lipski algebra on c-tables Imielinski & Lipski (1984) is given by the* K*-relational algebra on the semiring* K*c-table* = (PosBool(*X*), ∨, ∧, false,true) *where* PosBool(*X*) *is the set of all positive boolean expressions over a finite set of variables X in which any two equivalent*

<sup>2</sup> As is standard, we drop the subscript on the empty relation where it can be inferred by context.

Note that in the case for projection, the sum is finite since *A* has finite support.

**Union:** *If A*, *B*: *U-Tup* → *K, then A* ∪ *B*: *U-Tup* → *K is defined by*

*domain V. The projection π<sup>V</sup> A*: *V-Tup* → *K is defined by*

(*π<sup>V</sup> <sup>A</sup>*)(*t*) def

(*<sup>A</sup> <sup>B</sup>*)(*t*) def

**Renaming:** *If A*: *U-Tup* → *K and β* : *U* → *U*� *is a bijection, then ρβA*: *U*�

**Lemma 2.1** (Example algebras on K-relations Green et al. (2007))**.**

*on the boolean semiring* K**<sup>B</sup>** = (**B**, ∨, ∧, false,true)*.*

*events and* P(Ω) *is the powerset of* Ω*.*

*for all U-tuples t.*<sup>2</sup>

*σ***P***A* : *U-Tup* → *K is defined by*

and associated query algebras.

*expressions are identified.*

*by*

*5. The provenance algebra of polynomials with variables from X and coefficients from* **N** *Cui et al. (2000); Green et al. (2007) is given by the* K*-relational algebra on the provenance semiring* K*prov* = (**N**[*X*], +, ·, 0, 1)*.*

The positive relational algebra RA<sup>+</sup> <sup>K</sup> satisfies many of the familiar relational equalities Ullman (1988; 1989).

**Proposition 2.1** (Identities of K-relations Green et al. (2007); Hajdinjak & Bierman (2011))**.** *The following identities hold for the positive relational algebra on* K*-relations:*


It is important to note that the properties of idempotence of union, *A* ∪ *A* = *A*, and self-join, *A A* = *A*, are missing from this list. These properties fail for the bag semantics and provenance, so they fail to hold for the more general model.

Green et al. only considered positive queries and left open the problem of supporting negative query operators.

#### **2.2 Relational algebra RA**<sup>+</sup> K(\)

Geerts and Poggi Geerts & Poggi (2010) recently proposed extending the K-relation model by a difference operator following a standard approach for introducing a monus operator into an additive commutative monoid Amer (1984). First, they restricted the class of commutative semirings by requiring that every semiring additionally satisfy the following pair of conditions.

**Definition 2.3** (GP-conditions Geerts & Poggi (2010))**.** *A commutative semiring* K = (*K*, ⊕, �, **0**, **1**) *is said to satisfy the* GP conditions *if the following two conditions hold.*

*1. The preorder x* � *y on K defined as*

$$\text{ax } \preceq y \text{ iff there exists a } z \in \mathcal{K} \text{ such that } \mathbf{x} \oplus z = y \tag{13}$$

*is a* partial order*.* 3

*2. For each pair of elements x*, *y* ∈ *K, the set* {*z* ∈ *K*; *x* � *y* ⊕ *z*} *has a smallest element. (As* � *defines a partial order, this smallest element must be unique, if it exists.)*

**Definition 2.4** (*m*-semiring Geerts & Poggi (2010))**.** *Let* K = (*K*, ⊕, �, **0**, **1**) *be a commutative semiring that satisfies the GP conditions. For any x*, *y* ∈ *K, we define x* � *y to be the smallest element z such that x* � *y* ⊕ *z. A (commutative) semiring* K *that can be equipped with a* monus *operator* � *is called a* semiring with monus *or m*-semiring*.*

<sup>3</sup> While a *preorder* is a binary relation that is reflexive and transitive, a *partial order* is a binary relation that is refleksive, transitive, and antisymmetric.

Geerts and Poggi show that their resulting algebra coincides with the classical relational algebra, the bag algebra with the monus operator, the probabilistic relational algebra on event

K-Relations and Beyond 25

In this section we recall the definition of the L-relation model, the aim of which was to include

In a similarity context it is typically assumed that all data domains come equipped with a

**Definition 3.1** (Similarity measures Hajdinjak & Bierman (2011))**.** *Given a type τ and a commutative semiring* K = (*K*, ⊕, �, **0**, **1**)*, a* similarity measure *is a function ρ* : *τ* × *τ* → *K such*

Following earlier work Shenoi & Melton (1989), only reflexivity of the similarity measure was required. Other properties don't hold in general Hajdinjak & Bauer (2009). For example, symmetry does not hold when similarity denotes driving distance between two points in a town because of one-way streets. Another property is transitivity, but there are a number of non-transitive similarity measures, e.g. when similarity denotes likeness between two colours. Allowing only K-valued similarity relations, Hajdinjak and Bierman Hajdinjak & Bierman (2011) modeled an answer to a query as a K-relation in which each tuple is tagged by the similarity value between the tuple and the *ideal tuple*. (By an ideal tuple a tuple that perfectly fits the requirements of the similarity query is meant.) Prior to any querying, it is assumed that each *U*-tuple *t* has either desirability *A*(*t*) = **1** or *A*(*t*) = **0** whether it is in or out of *A*. **Example 3.1** (Common similarity measures)**.** *Three common examples of similarity measures are*

*2. A fuzzy equality measure ρ* : *τ* × *τ* → [0, 1] *where ρ*(*x*, *y*) *expresses the degree of equality of x and y; the closer x and y are to each other, the closer ρ*(*x*, *y*) *is to* 1*. Here, the unit interval* [0, 1] *is the*

*3. A distance measure ρ* : *τ* × *τ* → [0, *dmax*] *where ρ*(*x*, *y*) *is the distance from x to y. Here, the closed*

= true *if x and y are equal and* false *otherwise.*

K**<sup>B</sup>** = (**B**, ∨, ∧, false,true), (20)

K[0,1] = ([0, 1], max, min, 0, 1), (21)

K[0,*dmax*] = ([0, *dmax*], min, max, *dmax*, 0), (22)

similarity relations into the general K-relation framework of annotated relations.

tables, the relational algebra on *c*-tables, and the provenance algebra.

**3. The** L**-relation model**

**3.1 Domain similarities**

*as follows.*

similarity relation or similarity measure.

*1. An equality measure <sup>ρ</sup>* : *<sup>τ</sup>* <sup>×</sup> *<sup>τ</sup>* <sup>→</sup> **<sup>B</sup>** *where <sup>ρ</sup>*(*x*, *<sup>y</sup>*) def

*underlying set of the commutative semiring*

*called the* boolean semiring*.*

*called the* fuzzy semiring*.*

*called the* distance semiring*.*

*Here,* **B** = {false,true} *is the underlying set of the commutative semiring*

*interval* [0, *dmax*] *is the underlying set of the commutative semiring*

*that ρ is reflexive, i.e. ρ*(*x*, *x*) = **1***.*

Geerts and Poggi identified two equationally complete classes in the variety of *m*-semirings, namely


Recall that a *lattice-ordered ring* (or *l*-ring) is an algebraic structure K = (*K*, ∨, ∧, ⊕, −, **0**, �) such that (*K*, ∨, ∧) is a lattice, (*K*, ⊕, −, **0**, �) is a ring, operation ⊕ is order-preserving, and for *x*, *y* ≥ **0** we have *x* � *y* ≥ **0**. An *l*-ring is commutative if the multiplication operation � is commutative. The set of elements *x* for which **0** ≤ *x* is called the *positive cone* of the *l*-ring.

**Lemma 2.2** (Example *m*-semirings Geerts & Poggi (2010))**.**

*1. The boolean semiring,* K**<sup>B</sup>** = (**B**, ∨, ∧, false,true)*, is a boolean algebra. We have*

false � false = false, false � true = false, true � false = true, true � true = false. (14)

*2. The semiring of counting numbers,* K**<sup>N</sup>** = (**N**, +, ·, 0, 1)*, is the positive cone of the ring of integers,* **Z***. The monus corresponds to the truncated minus,*

$$\mathbf{x} \ominus \mathbf{y} = \max \{ \mathbf{0}, \mathbf{x} - \mathbf{y} \}. \tag{15}$$

*3. The probabilistic semiring,* K*prob* = (P(Ω), ∪, ∩, ∅, Ω)*, is a boolean algebra. The monus corresponds to set difference,*

$$X \ominus Y = X \backslash Y. \tag{16}$$

*4. In the case of the semiring of c-tables,* K*c-table* = (PosBool(*X*), ∨, ∧, false,true)*, the monus cannot be defined unless negated literals are added to the base set, in which case we get a boolean algebra. For any two expressions φ*1, *φ*<sup>2</sup> ∈ *Bool*(*X*) *we then have*

$$
\Phi\_1 \ominus \Phi\_2 = \Phi\_1 \land \neg \Phi\_2.\tag{17}
$$

*where negation* ¬ *over boolean expressions takes truth to falsity, and vice versa, and it interchanges the meet and the join operation.*

*5. The provenance semiring,* K*prov* = (**N**[*X*], +, ·, 0, 1)*, is the positive cone of the ring of polynomials from* **<sup>Z</sup>**[*X*]*. The monus of two polynomials f* [*X*] = <sup>∑</sup>*α*∈*<sup>I</sup> <sup>f</sup>αx<sup>α</sup> and g*[*X*] = <sup>∑</sup>*α*∈*<sup>I</sup> <sup>g</sup>αxα, where I is a finite subset of* **N***n, corresponds to*

$$f[X] \ominus \operatorname{g}[X] = \sum\_{\mathfrak{a} \in I} (f\_{\mathfrak{a}} \dot{-}\_{\mathfrak{g}} \mathfrak{a}\_{\mathfrak{a}}) \mathfrak{x}^{\mathfrak{a}} \,\,\,\tag{18}$$

*where* −˙ *denotes the truncated minus on* **<sup>N</sup>***.*

Given an *m*-semiring, the positive relational algebra RA<sup>+</sup> <sup>K</sup> can be extended with the missing difference operator as follows.

**Definition 2.5** (Relational algebra on K-relations Geerts & Poggi (2010))**.** *Let* K *be an m-semiring. The algebra RA*<sup>+</sup> K(\) *is obtained by extending RA*<sup>+</sup> <sup>K</sup> *with the operator:*

**Difference** *If A*, *B* : *U-Tup* → *K, then the difference A* \ *B* : *U-Tup* → *K is defined by*

$$(A \backslash B)(t) \stackrel{\text{def}}{=} A(t) \in \mathcal{B}(t). \tag{19}$$

6 Will-be-set-by-IN-TECH

Geerts and Poggi identified two equationally complete classes in the variety of *m*-semirings,

(1) *m*-semirings that are a boolean algebra (i.e., complemented distributive lattice with distinguished elements **0** and **1**), for which the monus behaves like set difference, and (2) *m*-semirings that are the positive cone of a lattice-ordered commutative ring, for which the

Recall that a *lattice-ordered ring* (or *l*-ring) is an algebraic structure K = (*K*, ∨, ∧, ⊕, −, **0**, �) such that (*K*, ∨, ∧) is a lattice, (*K*, ⊕, −, **0**, �) is a ring, operation ⊕ is order-preserving, and for *x*, *y* ≥ **0** we have *x* � *y* ≥ **0**. An *l*-ring is commutative if the multiplication operation � is commutative. The set of elements *x* for which **0** ≤ *x* is called the *positive cone* of the *l*-ring.

false � false = false, false � true = false, true � false = true, true � true = false. (14)

*x* � *y* = max{0, *x* − *y*}. (15)

*X* � *Y* = *X* \ *Y*. (16)

*φ*<sup>1</sup> � *φ*<sup>2</sup> = *φ*<sup>1</sup> ∧ ¬*φ*2, (17)

<sup>K</sup> *with the operator:*

= *A*(*t*) � *B*(*t*). (19)

(*fα*−˙ *<sup>g</sup>α*)*xα*, (18)

<sup>K</sup> can be extended with the missing

*2. The semiring of counting numbers,* K**<sup>N</sup>** = (**N**, +, ·, 0, 1)*, is the positive cone of the ring of integers,*

*3. The probabilistic semiring,* K*prob* = (P(Ω), ∪, ∩, ∅, Ω)*, is a boolean algebra. The monus*

*4. In the case of the semiring of c-tables,* K*c-table* = (PosBool(*X*), ∨, ∧, false,true)*, the monus cannot be defined unless negated literals are added to the base set, in which case we get a boolean algebra.*

*where negation* ¬ *over boolean expressions takes truth to falsity, and vice versa, and it interchanges*

*α*∈*I*

*5. The provenance semiring,* K*prov* = (**N**[*X*], +, ·, 0, 1)*, is the positive cone of the ring of polynomials from* **<sup>Z</sup>**[*X*]*. The monus of two polynomials f* [*X*] = <sup>∑</sup>*α*∈*<sup>I</sup> <sup>f</sup>αx<sup>α</sup> and g*[*X*] = <sup>∑</sup>*α*∈*<sup>I</sup> <sup>g</sup>αxα, where I is*

**Definition 2.5** (Relational algebra on K-relations Geerts & Poggi (2010))**.** *Let* K *be an*

*<sup>f</sup>* [*X*] � *<sup>g</sup>*[*X*] = ∑

K(\) *is obtained by extending RA*<sup>+</sup>

**Difference** *If A*, *B* : *U-Tup* → *K, then the difference A* \ *B* : *U-Tup* → *K is defined by*

(*<sup>A</sup>* \ *<sup>B</sup>*)(*t*) def

monus behaves like the truncated minus of the natural numbers.

*1. The boolean semiring,* K**<sup>B</sup>** = (**B**, ∨, ∧, false,true)*, is a boolean algebra. We have*

**Lemma 2.2** (Example *m*-semirings Geerts & Poggi (2010))**.**

**Z***. The monus corresponds to the truncated minus,*

*For any two expressions φ*1, *φ*<sup>2</sup> ∈ *Bool*(*X*) *we then have*

*corresponds to set difference,*

*the meet and the join operation.*

difference operator as follows.

*m-semiring. The algebra RA*<sup>+</sup>

*a finite subset of* **N***n, corresponds to*

*where* −˙ *denotes the truncated minus on* **<sup>N</sup>***.*

Given an *m*-semiring, the positive relational algebra RA<sup>+</sup>

namely

Geerts and Poggi show that their resulting algebra coincides with the classical relational algebra, the bag algebra with the monus operator, the probabilistic relational algebra on event tables, the relational algebra on *c*-tables, and the provenance algebra.

## **3. The** L**-relation model**

In this section we recall the definition of the L-relation model, the aim of which was to include similarity relations into the general K-relation framework of annotated relations.

#### **3.1 Domain similarities**

In a similarity context it is typically assumed that all data domains come equipped with a similarity relation or similarity measure.

**Definition 3.1** (Similarity measures Hajdinjak & Bierman (2011))**.** *Given a type τ and a commutative semiring* K = (*K*, ⊕, �, **0**, **1**)*, a* similarity measure *is a function ρ* : *τ* × *τ* → *K such that ρ is reflexive, i.e. ρ*(*x*, *x*) = **1***.*

Following earlier work Shenoi & Melton (1989), only reflexivity of the similarity measure was required. Other properties don't hold in general Hajdinjak & Bauer (2009). For example, symmetry does not hold when similarity denotes driving distance between two points in a town because of one-way streets. Another property is transitivity, but there are a number of non-transitive similarity measures, e.g. when similarity denotes likeness between two colours.

Allowing only K-valued similarity relations, Hajdinjak and Bierman Hajdinjak & Bierman (2011) modeled an answer to a query as a K-relation in which each tuple is tagged by the similarity value between the tuple and the *ideal tuple*. (By an ideal tuple a tuple that perfectly fits the requirements of the similarity query is meant.) Prior to any querying, it is assumed that each *U*-tuple *t* has either desirability *A*(*t*) = **1** or *A*(*t*) = **0** whether it is in or out of *A*.

**Example 3.1** (Common similarity measures)**.** *Three common examples of similarity measures are as follows.*

*1. An equality measure <sup>ρ</sup>* : *<sup>τ</sup>* <sup>×</sup> *<sup>τ</sup>* <sup>→</sup> **<sup>B</sup>** *where <sup>ρ</sup>*(*x*, *<sup>y</sup>*) def = true *if x and y are equal and* false *otherwise. Here,* **B** = {false,true} *is the underlying set of the commutative semiring*

$$\mathcal{K}\_{\mathbb{B}} = (\mathbb{B}, \vee, \wedge, \mathsf{false}, \mathsf{true}), \tag{20}$$

*called the* boolean semiring*.*

*2. A fuzzy equality measure ρ* : *τ* × *τ* → [0, 1] *where ρ*(*x*, *y*) *expresses the degree of equality of x and y; the closer x and y are to each other, the closer ρ*(*x*, *y*) *is to* 1*. Here, the unit interval* [0, 1] *is the underlying set of the commutative semiring*

$$\mathcal{K}\_{[0,1]} = ([0,1], \max, \min, 0, 1), \tag{21}$$

*called the* fuzzy semiring*.*

*3. A distance measure ρ* : *τ* × *τ* → [0, *dmax*] *where ρ*(*x*, *y*) *is the distance from x to y. Here, the closed interval* [0, *dmax*] *is the underlying set of the commutative semiring*

$$\mathcal{K}\_{[0,d\_{\text{max}}]} = ([0, d\_{\text{max}}], \min, \max, d\_{\text{max}}, 0), \tag{22}$$

*called the* distance semiring*.*

**Definition 3.4.** *Given a commutative semiring* K = (*K*, ⊕, �, **0**, **1**)*, union and intersection of two*

K-Relations and Beyond 27

Whilst the similarity semirings support a monus operation in the sense of Geerts and Poggi Geerts & Poggi (2010), the induced difference operator in the relational algebra does

• The fuzzy semiring, K[0,1] = ([0, 1], max, min, 0, 1), satisfies the GP conditions, and the

Hajdinjak and Bierman Hajdinjak & Bierman (2011) regret that this is not the expected definition. First, fuzzy set difference is universally defined as min{*A*(*t*), 1 − *B*(*t*)} Rosado et al. (2006). Secondly, in similarity settings only totally irrelevant tuples should be annotated with 0 and excluded as a possible answer Hajdinjak & Miheliˇc (2006). In the case of the fuzzy set difference *A* \ *B*, these are exclusively those tuples *t* where *A*(*t*) = 0

• The distance semiring, K[0,*d*max] = ([0, *d*max], min, max, *d*max, 0), satisfies the GP-conditions,

(*<sup>A</sup>* \ *<sup>B</sup>*)(*t*) = *<sup>d</sup>*max if *<sup>A</sup>*(*t*) <sup>≥</sup> *<sup>B</sup>*(*t*),

Again, in the distance setting, we would expect the difference operator to be defined as max{*A*(*t*), *d*max − *B*(*t*)}. Moreover, this is a continuous function in contrast to the step

Rather than using a monus-like operator, Hajdinjak and Bierman Hajdinjak & Bierman (2011)

**Definition 3.5** (Negation)**.** *Given a set L equipped with a preorder, a* negation *is an operation* ¬ :

function behaviour of the operator above resulting from the monus definition.

*L* → *L that reverts order, x* ≤ *y* =⇒ ¬*y* ≤ ¬*x, and is involutive,* ¬¬*x* = *x.*

0 if *A*(*t*) ≤ *B*(*t*),

*x* � *y* = min{*z* ∈ [0, 1]; *x* ≤ max{*y*, *z*}} =

This induces the following difference operator in the relational algebra.

*x* � *y* = max{*z* ∈ [0, *d*max]; *x* ≥ min{*y*, *z*}} =

This induces the following difference operator in the relational algebra.

(*<sup>A</sup>* \ *<sup>B</sup>*)(*t*) =

or *B*(*t*) = 1, and certainly not where *A*(*t*) ≤ *B*(*t*).

and the monus operator is as follows.

proposed a different approach using *negation*.

= **P**1(*t*) ⊕ **P**2(*t*), (29)

= **P**1(*t*) � **P**2(*t*). (30)

0 if *<sup>x</sup>* <sup>≤</sup> *<sup>y</sup>*,

*<sup>A</sup>*(*t*) if *<sup>A</sup>*(*t*) <sup>&</sup>gt; *<sup>B</sup>*(*t*). (32)

*d*max if *x* ≥ *y*,

*<sup>A</sup>*(*t*) if *<sup>A</sup>*(*t*) <sup>&</sup>lt; *<sup>B</sup>*(*t*). (34)

*<sup>x</sup>* if *<sup>x</sup>* <sup>&</sup>lt; *<sup>y</sup>*. (33)

*<sup>x</sup>* if *<sup>x</sup>* <sup>&</sup>gt; *<sup>y</sup>*. (31)

(**P**<sup>1</sup> <sup>∪</sup> **<sup>P</sup>**2)(*t*) def

(**P**<sup>1</sup> <sup>∩</sup> **<sup>P</sup>**2)(*t*) def

*selection predicates* **P**1, **P**<sup>2</sup> : *U-Tup* → *K is defined as follows.*

**3.3 Relational difference**

not behave as desired.

monus operator is as follows.

*Because of their use the commutative semirings from this example were called* similarity semirings*.*

A predefined environment of similarity measures that can be used for building queries is assumed—for every domain K = (*K*, ⊕, �, **0**, **1**) and every K-relation over a schema *U* = {*a*<sup>1</sup> : *τ*1, ..., *an* : *τn*} there are similarity measures

$$
\rho\_{\mathfrak{a}\_l} \colon \mathfrak{x}\_l \times \mathfrak{x}\_l \to \mathbb{K}\_\prime \mathbf{1} \le i \le n. \tag{23}
$$

#### **3.2 The selection predicate**

In the original Green et al. model (Definition 2.2) the selection predicate maps *U*-tuples to either the zero or the unit element of the semiring. Since in a similarity context we expect the selection predicate to reflect the relevance or the degree of membership of a particular tuple in the answer relation, not just the two possibilities of full membership (**1**) or non-membership (**0**), the following generalization to the original definition was proposed Hajdinjak & Bierman (2011).

**Selection:** If *A*: *U*-Tup → *K* and the selection predicate

$$\mathbf{P} \colon \mathcal{U}\text{-Tup} \to \mathcal{K} \tag{24}$$

maps each *U*-tuple to an element of *K* (instead of mapping to either **0** or **1**), then *σ***P***A*: *U*-Tup → *K* is (still) defined by

$$(\sigma \mathbf{p} A)(t) = A(t) \odot \mathbf{P}(t). \tag{25}$$

Selection queries can now be classified on whether they are based on the attribute values (as is normal in non-similarity queries) or whether they use the similarity measures. Selection queries can also use constant values.

**Definition 3.2** (Primitive predicate Hajdinjak & Bierman (2011))**.** *Suppose in a schema U* = {*a*<sup>1</sup> : *τ*1,..., *an* : *τn*} *the types of attributes ai and aj coincide. Then given a commutative semiring* K = (*K*, ⊕, �, **0**, **1**)*, for a given binary predicate θ, the* primitive predicate [*ai θ aj*]: *U-Tup* → *K is defined as follows.*

$$[a\_i \; \theta \; a\_j](t) \stackrel{\text{def}}{=} \chi\_{a\_i \theta a\_j}(t) = \begin{cases} \mathbf{1} & \text{if } t(a\_i) \; \theta \; t(a\_j), \\ \mathbf{0} & \text{otherwise.} \end{cases} \tag{26}$$

*In words,* [*ai θ aj*] *behaves as the characteristic map of θ, where θ may be any arithmetic comparison operator among* =*,* �=*,* <*,* >*,* ≤*,* ≥*.*

**Definition 3.3** (Similarity predicate Hajdinjak & Bierman (2011))**.** *Suppose in a schema U* = {*a*<sup>1</sup> : *τ*1,..., *an* : *τn*} *the types of attributes ai and aj coincide. Given a commutative semiring* K = (*K*, ⊕, �, **0**, **1**)*, the* similarity predicate [*ai* like *aj*] : *U-Tup* → *K is defined as follows.*

$$[a\_i \text{ 1:1e} \text{ } a\_j](t) \stackrel{\text{def}}{=} \rho\_{a\_i}(t(a\_i), t(a\_j)). \tag{27}$$

*A symmetric version is as follows.*

$$[a\_{\dot{l}} \sim a\_{\dot{l}}] \stackrel{\text{def}}{=} [a\_{\dot{l}} \text{ 1\'ike } a\_{\dot{l}}] \cup [a\_{\dot{l}} \text{ 1\'ike } a\_{\dot{l}}] \,\tag{28}$$

*where union (*∪*) of selection predicates is defined below.*

8 Will-be-set-by-IN-TECH

*Because of their use the commutative semirings from this example were called* similarity semirings*.*

A predefined environment of similarity measures that can be used for building queries is assumed—for every domain K = (*K*, ⊕, �, **0**, **1**) and every K-relation over a schema *U* =

In the original Green et al. model (Definition 2.2) the selection predicate maps *U*-tuples to either the zero or the unit element of the semiring. Since in a similarity context we expect the selection predicate to reflect the relevance or the degree of membership of a particular tuple in the answer relation, not just the two possibilities of full membership (**1**) or non-membership (**0**), the following generalization to the original definition was proposed Hajdinjak & Bierman

maps each *U*-tuple to an element of *K* (instead of mapping to either **0** or **1**), then

Selection queries can now be classified on whether they are based on the attribute values (as is normal in non-similarity queries) or whether they use the similarity measures. Selection

**Definition 3.2** (Primitive predicate Hajdinjak & Bierman (2011))**.** *Suppose in a schema U* = {*a*<sup>1</sup> : *τ*1,..., *an* : *τn*} *the types of attributes ai and aj coincide. Then given a commutative semiring* K = (*K*, ⊕, �, **0**, **1**)*, for a given binary predicate θ, the* primitive predicate [*ai θ aj*]: *U-Tup* → *K is*

(*t*) =

*In words,* [*ai θ aj*] *behaves as the characteristic map of θ, where θ may be any arithmetic comparison*

**Definition 3.3** (Similarity predicate Hajdinjak & Bierman (2011))**.** *Suppose in a schema U* = {*a*<sup>1</sup> : *τ*1,..., *an* : *τn*} *the types of attributes ai and aj coincide. Given a commutative semiring* K =

**1** *if t*(*ai*) *θ t*(*aj*),

**<sup>0</sup>** *otherwise.* (26)

= *ρai*(*t*(*ai*), *t*(*aj*)). (27)

= [*ai* like *aj*] ∪ [*aj* like *ai*], (28)

*ρai* : *τ<sup>i</sup>* × *τ<sup>i</sup>* → *K*, 1 ≤ *i* ≤ *n*. (23)

**P**: *U*-Tup → *K* (24)

(*σ***P***A*)(*t*) = *A*(*t*) � **P**(*t*). (25)

{*a*<sup>1</sup> : *τ*1, ..., *an* : *τn*} there are similarity measures

**Selection:** If *A*: *U*-Tup → *K* and the selection predicate

[*ai <sup>θ</sup> aj*](*t*) def

[*ai* ∼ *aj*]

*where union (*∪*) of selection predicates is defined below.*

= *χaiθaj*

(*K*, ⊕, �, **0**, **1**)*, the* similarity predicate [*ai* like *aj*] : *U-Tup* → *K is defined as follows.*

[*ai* like *aj*](*t*) def

def

*σ***P***A*: *U*-Tup → *K* is (still) defined by

queries can also use constant values.

*operator among* =*,* �=*,* <*,* >*,* ≤*,* ≥*.*

*A symmetric version is as follows.*

*defined as follows.*

**3.2 The selection predicate**

(2011).

**Definition 3.4.** *Given a commutative semiring* K = (*K*, ⊕, �, **0**, **1**)*, union and intersection of two selection predicates* **P**1, **P**<sup>2</sup> : *U-Tup* → *K is defined as follows.*

$$(\mathbf{P}\_1 \cup \mathbf{P}\_2)(t) \stackrel{\text{def}}{=} \mathbf{P}\_1(t) \oplus \mathbf{P}\_2(t),\tag{29}$$

$$(\mathbf{P}\_1 \cap \mathbf{P}\_2)(t) \stackrel{\text{def}}{=} \mathbf{P}\_1(t) \odot \mathbf{P}\_2(t). \tag{30}$$

#### **3.3 Relational difference**

Whilst the similarity semirings support a monus operation in the sense of Geerts and Poggi Geerts & Poggi (2010), the induced difference operator in the relational algebra does not behave as desired.

• The fuzzy semiring, K[0,1] = ([0, 1], max, min, 0, 1), satisfies the GP conditions, and the monus operator is as follows.

$$\forall x \in y = \min\{z \in [0, 1] ; \mathbf{x} \le \max\{y, z\}\} = \begin{cases} 0 & \text{if } \mathbf{x} \le y\_{\prime} \\ \mathbf{x} & \text{if } \mathbf{x} > y\_{\prime} \end{cases} \tag{31}$$

This induces the following difference operator in the relational algebra.

$$(A \backslash B)(t) = \begin{cases} 0 & \text{if } A(t) \le B(t), \\ A(t) & \text{if } A(t) > B(t). \end{cases} \tag{32}$$

Hajdinjak and Bierman Hajdinjak & Bierman (2011) regret that this is not the expected definition. First, fuzzy set difference is universally defined as min{*A*(*t*), 1 − *B*(*t*)} Rosado et al. (2006). Secondly, in similarity settings only totally irrelevant tuples should be annotated with 0 and excluded as a possible answer Hajdinjak & Miheliˇc (2006). In the case of the fuzzy set difference *A* \ *B*, these are exclusively those tuples *t* where *A*(*t*) = 0 or *B*(*t*) = 1, and certainly not where *A*(*t*) ≤ *B*(*t*).

• The distance semiring, K[0,*d*max] = ([0, *d*max], min, max, *d*max, 0), satisfies the GP-conditions, and the monus operator is as follows.

$$\forall x \in y = \max \{ z \in [0, d\_{\text{max}}] ; x \ge \min \{ y, z \} \} = \begin{cases} d\_{\text{max}} & \text{if } x \ge y, \\ \ge & \text{if } x < y. \end{cases} \tag{33}$$

This induces the following difference operator in the relational algebra.

$$(A \mid B)(t) = \begin{cases} d\_{\text{max}} & \text{if } A(t) \ge B(t), \\ A(t) & \text{if } A(t) < B(t). \end{cases} \tag{34}$$

Again, in the distance setting, we would expect the difference operator to be defined as max{*A*(*t*), *d*max − *B*(*t*)}. Moreover, this is a continuous function in contrast to the step function behaviour of the operator above resulting from the monus definition.

Rather than using a monus-like operator, Hajdinjak and Bierman Hajdinjak & Bierman (2011) proposed a different approach using *negation*.

**Definition 3.5** (Negation)**.** *Given a set L equipped with a preorder, a* negation *is an operation* ¬ : *L* → *L that reverts order, x* ≤ *y* =⇒ ¬*y* ≤ ¬*x, and is involutive,* ¬¬*x* = *x.*

**3.4 Relational algebra on** L**-relations**

*and* ¬: *L* → *L is a negation operation.*

probabilistic semiring and the semiring on *c*-tables.

*complete lattice* (*L*,

*the following laws hold.*

(*L*,

*are defined as follows:*

*for all U-tuples t.*

We have seen that the K-relational algebra does not satisfy the properties of idempotence of union and self-join because, in general, the sum and product operators of a semiring are not idempotent. In order to satisfy *all* the classical relational identities (including idempotence of union and self-join) and to allow a comparison and ordering of tags, Hajdinjak and Bierman Hajdinjak & Bierman (2011) have restricted commutative *n*-semirings to De Morgan frames (with the lattice join defined as sum and the lattice meet as product). Recall that the

K-Relations and Beyond 29

, ∧, **<sup>0</sup>**, **<sup>1</sup>**) *where finite meets distribute over arbitrary joins, i.e.,*

The similarity semirings from Example 3.1 are De Morgan frames, the same holds for the

*Morgan frame. An* L-relation *over a schema U* = {*a*<sup>1</sup> : *τ*1,..., *an* : *τn*} *is a function A*: *U-Tup* → *L.* **Definition 3.9** (Relational algebra on L-relations Hajdinjak & Bierman (2011))**.** *Suppose* L =

, <sup>∧</sup>, **<sup>0</sup>**, **<sup>1</sup>**, <sup>¬</sup>) *is a De Morgan frame. The operations of the relational algebra on* <sup>L</sup>*, denoted RA*L*,*

<sup>∅</sup>(*t*) def

(*t*�↓*V*)=*t and A*(*t*�)�=**0***A*(*<sup>t</sup>*

(*<sup>A</sup>* <sup>∪</sup> *<sup>B</sup>*)(*t*) def

**Projection:** *If A*: *U-Tup* → *L and V* ⊂ *U, the projection of A on attributes V is defined by*

=

(*σ***P***A*)(*t*) def

**Selection:** *If A*: *U-Tup* → *L and the selection predicate* **P**: *U-Tup* → *L maps each U-tuple to an*

*iyi* =

, ∧, **<sup>0</sup>**, **<sup>1</sup>**, ¬), *is a*

, ∧, **<sup>0</sup>**, **<sup>1</sup>**, ¬)*,*

, ∧, **<sup>0</sup>**, **<sup>1</sup>**, ¬) *be a De*

*<sup>i</sup>*(*x* ∧ *yi*), (42)

¬**0** = **1** (43) ¬**1** = **0** (44) ¬(*x* ∨ *y*) = ¬*x* ∧ ¬*y* (45) ¬(*x* ∧ *y*) = ¬*x* ∨ ¬*y* (46)

= **0** (47)

= *A*(*t*) ∨ *B*(*t*). (48)

= *A*(*t*) ∧ **P**(*t*). (50)

). (49)

�

lattice supremum ∨ and infimum ∧ operators are always idempotent.

**Definition 3.7** (De Morgan frame Salii (1983))**.** *A* De Morgan frame*,* L = (*L*,

*<sup>x</sup>* ∧

**Proposition 3.1** (De Morgan laws Salii (1983))**.** *Given a De Morgan frame* L = (*L*,

**Definition 3.8** (L-relation Hajdinjak & Bierman (2011))**.** *Let* L = (*L*,

**Empty relation:** *For any set of attributes U there is* ∅*<sup>U</sup>* : *U-Tup* → *L such that*

**Union:** *If A*, *B*: *U-Tup* → *L then A* ∪ *B* : *U-Tup* → *L is defined by*

*element of* L*, then σ***P***A*: *U-Tup* → *L is defined by*

(*π<sup>V</sup> <sup>A</sup>*)(*t*) def

**Definition 3.6** (*n*-semiring Hajdinjak & Bierman (2011))**.** *A (commutative) n-semiring* K = (*K*, ⊕, �, **0**, **1**, ¬) *is a (commutative) semiring* (*K*, ⊕, �, **0**, **1**) *equipped with negation,* ¬: *K* → *K (with respect to the preorder on K).*

Provided that K = (*K*, ⊕, �, **0**, **1**, ¬) is a commutative *n*-semiring, the difference of K-relations *A*, *B* : *U*-Tup → *K* may be defined by

$$(A \backslash B)(t) \stackrel{\text{def}}{=} A(t) \odot \neg B(t). \tag{35}$$

Each of the similarity semirings has a negation operation that, in contrast to the monus, gives the expected notion of relational difference.

**Example 3.2** (Relational difference over common similarity measures)**.**

*• In the boolean semiring,* K**<sup>B</sup>** = (**B**, ∨, ∧, false,true)*, negation can be defined as complementation.*

$$\neg \propto \begin{cases} \text{true} & \text{if } \mathbf{x} = \text{false}, \\ \text{false} & \text{if } \mathbf{x} = \text{true}. \end{cases} \tag{36}$$

*From the above we get exactly the monus-based difference of* K**B***-relations.*

$$A(t) \odot \neg \mathcal{B}(t) = A(t) \odot \mathcal{B}(t) = \begin{cases} \mathtt{false} & \text{if } \mathcal{B}(t) = \mathtt{true}, \\ A(t) & \text{if } \mathcal{B}(t) = \mathtt{false}. \end{cases} \tag{37}$$

*• In the fuzzy semiring,* K[0,1] = ([0, 1], max, min, 0, 1)*, ordered by relation* ≤*, we can define a negation operator as*

$$\neg \mathfrak{x} \stackrel{\text{def}}{=} 1 - \mathfrak{x}.\tag{38}$$

*In the generalized fuzzy semiring* K[*a*,*b*] = ([*a*, *b*], max, min, *a*, *b*)*, we can define* ¬*x* def = *a* + *b* − *x. In the fuzzy semiring we thus get*

$$A(t) \odot \neg B(t) = \min\{A(t), 1 - B(t)\},\tag{39}$$

*and in the generalized fuzzy semiring we get A*(*t*) � ¬*B*(*t*) = min{*A*(*t*), *a* + *b* − *B*(*t*)}*. These coincide with the fuzzy notions of difference on* [0, 1] *and* [*a*, *b*]*, respectively Rosado et al. (2006).*

*• In the distance semiring,* K[0,*dmax*] = ([0, *dmax*], min, max, *dmax*, 0)*, ordered by relation* ≥*, we can define a negation operator as*

$$-\infty \stackrel{\text{def}}{=} d\_{\text{max}} - \infty. \tag{40}$$

*We again get the expected notion of difference.*

$$A(t) \odot \neg B(t) = \max\{A(t), d\_{\max} - B(t)\}. \tag{41}$$

*This is a continuous function of A*(*t*) *and B*(*t*)*, and it calculates the greatest distance dmax only if A*(*t*) = *dmax or B*(*t*) = 0*.*

Moreover, the negation operation gives the same result as the monus when K is the boolean semiring, K**B**, the probabilistic semiring, Kprob, or the semiring on *c*-tables, K*c*-table. Unfortunately, while the provenance semiring, Kprov, and the semiring of counting numbers, K**N**, both contain a monus, neither contains a negation operation. In general, not all *m*-semirings are *n*-semirings. The opposite also holds Hajdinjak & Bierman (2011).

10 Will-be-set-by-IN-TECH

**Definition 3.6** (*n*-semiring Hajdinjak & Bierman (2011))**.** *A (commutative) n-semiring* K = (*K*, ⊕, �, **0**, **1**, ¬) *is a (commutative) semiring* (*K*, ⊕, �, **0**, **1**) *equipped with negation,* ¬: *K* → *K*

Provided that K = (*K*, ⊕, �, **0**, **1**, ¬) is a commutative *n*-semiring, the difference of K-relations

Each of the similarity semirings has a negation operation that, in contrast to the monus, gives

*• In the boolean semiring,* K**<sup>B</sup>** = (**B**, ∨, ∧, false,true)*, negation can be defined as complementation.*

*• In the fuzzy semiring,* K[0,1] = ([0, 1], max, min, 0, 1)*, ordered by relation* ≤*, we can define a*

*and in the generalized fuzzy semiring we get A*(*t*) � ¬*B*(*t*) = min{*A*(*t*), *a* + *b* − *B*(*t*)}*. These coincide with the fuzzy notions of difference on* [0, 1] *and* [*a*, *b*]*, respectively Rosado et al. (2006). • In the distance semiring,* K[0,*dmax*] = ([0, *dmax*], min, max, *dmax*, 0)*, ordered by relation* ≥*, we can*

*This is a continuous function of A*(*t*) *and B*(*t*)*, and it calculates the greatest distance dmax only if*

Moreover, the negation operation gives the same result as the monus when K is the boolean semiring, K**B**, the probabilistic semiring, Kprob, or the semiring on *c*-tables, K*c*-table. Unfortunately, while the provenance semiring, Kprov, and the semiring of counting numbers, K**N**, both contain a monus, neither contains a negation operation. In general, not all

¬*x* def

*In the generalized fuzzy semiring* K[*a*,*b*] = ([*a*, *b*], max, min, *a*, *b*)*, we can define* ¬*x*

¬*x* def

*m*-semirings are *n*-semirings. The opposite also holds Hajdinjak & Bierman (2011).

true *if x* = false,

= *A*(*t*) � ¬*B*(*t*). (35)

false *if x* <sup>=</sup> true. (36)

= 1 − *x*. (38)

= *dmax* − *x*. (40)

*<sup>A</sup>*(*t*) *if B*(*t*) = false. (37)

def

= *a* + *b* − *x.*

false *if B*(*t*) = true,

*A*(*t*) � ¬*B*(*t*) = min{*A*(*t*), 1 − *B*(*t*)}, (39)

*A*(*t*) � ¬*B*(*t*) = max{*A*(*t*), *dmax* − *B*(*t*)}. (41)

(*<sup>A</sup>* \ *<sup>B</sup>*)(*t*) def

**Example 3.2** (Relational difference over common similarity measures)**.**

¬*x* def = 

*From the above we get exactly the monus-based difference of* K**B***-relations.*

*A*(*t*) � ¬*B*(*t*) = *A*(*t*) � *B*(*t*) =

*(with respect to the preorder on K).*

*negation operator as*

*In the fuzzy semiring we thus get*

*define a negation operator as*

*A*(*t*) = *dmax or B*(*t*) = 0*.*

*We again get the expected notion of difference.*

*A*, *B* : *U*-Tup → *K* may be defined by

the expected notion of relational difference.

## **3.4 Relational algebra on** L**-relations**

We have seen that the K-relational algebra does not satisfy the properties of idempotence of union and self-join because, in general, the sum and product operators of a semiring are not idempotent. In order to satisfy *all* the classical relational identities (including idempotence of union and self-join) and to allow a comparison and ordering of tags, Hajdinjak and Bierman Hajdinjak & Bierman (2011) have restricted commutative *n*-semirings to De Morgan frames (with the lattice join defined as sum and the lattice meet as product). Recall that the lattice supremum ∨ and infimum ∧ operators are always idempotent.

**Definition 3.7** (De Morgan frame Salii (1983))**.** *A* De Morgan frame*,* L = (*L*, , ∧, **<sup>0</sup>**, **<sup>1</sup>**, ¬), *is a complete lattice* (*L*, , ∧, **<sup>0</sup>**, **<sup>1</sup>**) *where finite meets distribute over arbitrary joins, i.e.,*

$$
\pi \wedge \bigvee\_{i} y\_{i} = \bigvee\_{i} (\pi \wedge y\_{i})\_{\prime} \tag{42}
$$

*and* ¬: *L* → *L is a negation operation.*

**Proposition 3.1** (De Morgan laws Salii (1983))**.** *Given a De Morgan frame* L = (*L*, , ∧, **<sup>0</sup>**, **<sup>1</sup>**, ¬)*, the following laws hold.*

$$\neg \mathbf{0} = \mathbf{1} \tag{43}$$

$$\neg \mathbf{1} = \mathbf{0} \tag{44}$$

$$\neg(\mathfrak{x}\vee y) = \neg\mathfrak{x}\wedge\neg y \tag{45}$$

$$\neg(\mathfrak{x}\wedge\mathfrak{y})=\neg\mathfrak{x}\vee\neg\mathfrak{y}\tag{46}$$

The similarity semirings from Example 3.1 are De Morgan frames, the same holds for the probabilistic semiring and the semiring on *c*-tables.

**Definition 3.8** (L-relation Hajdinjak & Bierman (2011))**.** *Let* L = (*L*, , ∧, **<sup>0</sup>**, **<sup>1</sup>**, ¬) *be a De Morgan frame. An* L-relation *over a schema U* = {*a*<sup>1</sup> : *τ*1,..., *an* : *τn*} *is a function A*: *U-Tup* → *L.*

**Definition 3.9** (Relational algebra on L-relations Hajdinjak & Bierman (2011))**.** *Suppose* L = (*L*, , <sup>∧</sup>, **<sup>0</sup>**, **<sup>1</sup>**, <sup>¬</sup>) *is a De Morgan frame. The operations of the relational algebra on* <sup>L</sup>*, denoted RA*L*, are defined as follows:*

**Empty relation:** *For any set of attributes U there is* ∅*<sup>U</sup>* : *U-Tup* → *L such that*

$$\mathcal{Q}(t) \stackrel{\text{def}}{=} \mathbf{0} \tag{47}$$

*for all U-tuples t.* **Union:** *If A*, *B*: *U-Tup* → *L then A* ∪ *B* : *U-Tup* → *L is defined by*

$$(A \cup B)(t) \stackrel{\text{def}}{=} A(t) \lor B(t). \tag{48}$$

**Projection:** *If A*: *U-Tup* → *L and V* ⊂ *U, the projection of A on attributes V is defined by*

$$(\pi\_V A)(t) \stackrel{\text{def}}{=} \bigvee\_{(t' \downarrow V) = t \text{ and } A(t') \neq \mathbf{0}} A(t'). \tag{49}$$

**Selection:** *If A*: *U-Tup* → *L and the selection predicate* **P**: *U-Tup* → *L maps each U-tuple to an element of* L*, then σ***P***A*: *U-Tup* → *L is defined by*

$$(\sigma \mathbf{p} A)(t) \stackrel{\text{def}}{=} A(t) \land \mathbf{P}(t). \tag{50}$$

**Definition 4.2** (Relational algebra with similarities Hajdinjak & Bierman (2011))**.** *The operations*

K-Relations and Beyond 31

**Empty relation:** *For any set of attributes U and corresponding De Morgan frame schema,* D*, the*

<sup>∅</sup>*U*(*t*)(*a*) def

**Projection:** *If A*: *U-Tup* → D(*U*)*-Tup and V* ⊂ *U, the projection of A on attributes V is defined by*

**Selection:** *If A*: *U-Tup* → D(*U*)*-Tup and the selection predicate* **P**: *U-Tup* → D(*U*)*-Tup maps each U-tuple to an element of* D(*U*)*-Tup, then σ***P***A*: *U-Tup* → D(*U*)*-Tup is defined by*

*schemata. Let their union,* D<sup>1</sup> ∪ D2*, contain an attribute, ci* : L*i, as soon as ci* : L*<sup>i</sup> is in* D<sup>1</sup> *or* D<sup>2</sup> *or both. (If there is an attribute with different corresponding De Morgan frames in* D<sup>1</sup> *and* D2*, a renaming of attributes is needed.) If A*: *U*1*-Tup* → D1(*U*1)*-Tup and B*: *U*2*-Tup* → D2(*U*2)*-Tup,*

*A*(*t* ↓ *U*1)(*a*) ∧*<sup>a</sup> B*(*t* ↓ *U*2)(*a*)

**Difference:** *If A*, *B*: *U-Tup* → D(*U*)*-Tup, then A* \ *B*: *U-Tup* → D(*U*)*-Tup is defined by*

**Renaming:** *If A*: *U-Tup* → D(*U*)*-Tup and β* : *U* → *U*� *is a bijection, then ρβA*: *U*�

(*ρβA*)(*t*)(*a*) def

As in the case of L-relations it is required that every tuple outside of a similarity database is ranked with the minimal De Morgan frame tuple, {*a*<sup>1</sup> : **0**1,..., *an* : **0***n*}, and every other tuple is ranked either with the maximal De Morgan frame tuple, {*a*<sup>1</sup> : **1**1,..., *an* : **1***n*}, or a smaller De Morgan frame tuple expressing a lower degree of containment of the tuple in the database.

(*t*�↓*V*)=*t and A*(*t*�)(*a*)�=**0***<sup>a</sup> <sup>A</sup>*(*<sup>t</sup>*

*<sup>a</sup>*, ∧*a*, **0***a*, **1***a*, ¬*a*)*.*

�

(*<sup>A</sup>* <sup>∪</sup> *<sup>B</sup>*)(*t*)(*a*) def

**Union:** *If A*, *B*: *U-Tup* → D(*U*)*-Tup, then A* ∪ *B*: *U-Tup* → D(*U*)*-Tup is defined by*

= �

(*σ***P***A*)(*t*)(*a*) def

*then A B is the* (D<sup>1</sup> ∪ D2)*-relation over U*<sup>1</sup> ∪ *U*<sup>2</sup> *defined as follows.*

⎧ ⎪⎨

⎪⎩

(*<sup>A</sup>* \ *<sup>B</sup>*)(*t*)(*a*) def

= **0***<sup>a</sup>* (54)

= *A*(*t*)(*a*) ∨*<sup>a</sup> B*(*t*)(*a*) (55)

�

= *A*(*t*)(*a*) ∧*<sup>a</sup>* **P**(*t*)(*a*) (57)

= *A*(*t*)(*a*) ∧*<sup>a</sup>* (¬*aB*(*t*)(*a*)) (59)

= *A*(*t*)(*β*(*a*)). (60)

1, ..., *bm* : L�

*A*(*t* ↓ *U*1)(*a*) *if a* ∈ *U*<sup>1</sup> − *U*<sup>2</sup> *B*(*t* ↓ *U*2)(*a*) *if a* ∈ *U*<sup>2</sup> − *U*<sup>1</sup>

)(*a*) (56)

*<sup>m</sup>*} *be De Morgan frame*

. (58)

*-Tup* →

*of the* relational algebra with similarities*, RA*D*, are defined as follows:*

*<sup>a</sup>*, ∧*a*, **0***a*, **1***a*, ¬*a*)*.*

(*π<sup>V</sup> <sup>A</sup>*)(*t*)(*a*) def

*<sup>a</sup>*, ∧*a*, **0***a*, **1***a*, ¬*a*)*.*

*<sup>a</sup>*, ∧*a*, **0***a*, **1***a*, ¬*a*)*.* **Join:** *Let* D<sup>1</sup> = {*a*<sup>1</sup> : L1, ..., *an* : L*n*} *and* D<sup>2</sup> = {*b*<sup>1</sup> : L�

=

*<sup>a</sup>*, ∧*a*, **0***a*, **1***a*, ¬*a*)*.*

*empty* D*-relation over U,* ∅*U, is defined such that*

*where t is a U-tuple and* D(*a*)=(*La*,

�

�

�

(*<sup>A</sup> <sup>B</sup>*)(*t*)(*a*) def

�

*where* D(*a*)=(*La*,

*where* D(*a*)=(*La*,

*where* D(*a*)=(*La*,

*where* D(*a*)=(*La*,

)*-Tup is defined by*

D(*U*�

**Join:** *If A*: *U*1*-Tup* → *L and B*: *U*2*-Tup* → *L, then A B is the* L*-relation over U*<sup>1</sup> ∪ *U*<sup>2</sup> *defined by*

$$(A \bowtie \mathcal{B})(t) \stackrel{\text{def}}{=} A(t) \land \mathcal{B}(t). \tag{51}$$

**Difference:** *If A*, *B*: *U-Tup* → *L, then A* \ *B*: *U-Tup* → *L is defined by*

$$(A \backslash B)(t) \stackrel{\text{def}}{=} A(t) \land \neg B(t). \tag{52}$$

**Renaming:** *If A*: *U-Tup* → *L and β* : *U* → *U*� *is a bijection, then ρβA* : *U*� *-Tup* → *L is defined by*

$$(\rho\_{\beta}A)(t) \stackrel{\text{def}}{=} A(t \circ \beta). \tag{53}$$

Unlike for K-relations, we need not require that L-relations have finite support, since De Morgan frames are complete lattices, which quarantees the existence of the join in the definition of projection.

It is important to note that since RA<sup>L</sup> satisfies *all* the main positive relational algebra identities, in terms of query optimization, all algebraic rewrites familiar from the classical (positive) relational algebra apply to RA<sup>L</sup> without restriction. Matters are a little different for the negative identities Hajdinjak & Bierman (2011). In fuzzy relations Rosado et al. (2006) many of the familiar laws concerning difference do not hold. For example, it is not the case that *A* \ *A* = ∅, and so it is not the case in general for the L-relational algebra. Consequently, some (negative) identities from the classical relational algebra do not hold any more.

## **4. The** D**-relation model**

Notice that all tuples across all the K-relations or the L-relations in the database and intermediate relations in queries must be annotated with a value from the same commutative semiring K or De Morgan frame L. To support simultaneously several different similarity measures (e.g., similarity of strings, driving distance between cities, likelihood of objects to be equal), and use these different measures in our queries (even within the same query), Hajdinjak and Bierman Hajdinjak & Bierman (2011) proposed to move from a tuple-annotated model to an attribute-annotated model. They associated every attribute with its own De Morgan frame. They generalized an L-relation, which is a map from a tuple to an annotation value from a De Morgan frame, to a D*-relation*, which is a map from a tuple to a corresponding tuple containing an annotation value for every element in the source tuple, referred to as a *De Morgan frame tuple*.

**Definition 4.1** (De Morgan frame schema, De Morgan frame tuple, D-relation Hajdinjak & Bierman (2011))**.**


12 Will-be-set-by-IN-TECH

**Join:** *If A*: *U*1*-Tup* → *L and B*: *U*2*-Tup* → *L, then A B is the* L*-relation over U*<sup>1</sup> ∪ *U*<sup>2</sup> *defined*

= *A*(*t*) ∧ *B*(*t*). (51)

= *A*(*t*) ∧ ¬*B*(*t*). (52)

= *A*(*t* ◦ *β*). (53)

*-Tup* → *L is defined by*

(*<sup>A</sup> <sup>B</sup>*)(*t*) def

(*<sup>A</sup>* \ *<sup>B</sup>*)(*t*) def

(*ρβA*)(*t*) def

Unlike for K-relations, we need not require that L-relations have finite support, since De Morgan frames are complete lattices, which quarantees the existence of the join in the

It is important to note that since RA<sup>L</sup> satisfies *all* the main positive relational algebra identities, in terms of query optimization, all algebraic rewrites familiar from the classical (positive) relational algebra apply to RA<sup>L</sup> without restriction. Matters are a little different for the negative identities Hajdinjak & Bierman (2011). In fuzzy relations Rosado et al. (2006) many of the familiar laws concerning difference do not hold. For example, it is not the case that *A* \ *A* = ∅, and so it is not the case in general for the L-relational algebra. Consequently,

Notice that all tuples across all the K-relations or the L-relations in the database and intermediate relations in queries must be annotated with a value from the same commutative semiring K or De Morgan frame L. To support simultaneously several different similarity measures (e.g., similarity of strings, driving distance between cities, likelihood of objects to be equal), and use these different measures in our queries (even within the same query), Hajdinjak and Bierman Hajdinjak & Bierman (2011) proposed to move from a tuple-annotated model to an attribute-annotated model. They associated every attribute with its own De Morgan frame. They generalized an L-relation, which is a map from a tuple to an annotation value from a De Morgan frame, to a D*-relation*, which is a map from a tuple to a corresponding tuple containing an annotation value for every element in the source tuple, referred to as a *De*

**Definition 4.1** (De Morgan frame schema, De Morgan frame tuple, D-relation Hajdinjak &

*• A* De Morgan frame schema*,* D = {*a*<sup>1</sup> : L1, ..., *an* : L*n*}*, maps an attribute name, ai, to a De*

*• Given a De Morgan frame schema,* D*, a schema U, then a tuple s is said to be a* De Morgan frame tuple matching D over *U if dom*(*s*) = *dom*(*U*) = *dom*(D)*. The set of all De Morgan frame*

*• An* D-relation over *U is a finite map from U-Tup to* D(*U*)*-Tup.* Its support needs *not* be finite.

some (negative) identities from the classical relational algebra do not hold any more.

**Difference:** *If A*, *B*: *U-Tup* → *L, then A* \ *B*: *U-Tup* → *L is defined by*

**Renaming:** *If A*: *U-Tup* → *L and β* : *U* → *U*� *is a bijection, then ρβA* : *U*�

*by*

definition of projection.

**4. The** D**-relation model**

*Morgan frame tuple*.

Bierman (2011))**.**

*Morgan frame,* L*<sup>i</sup>* = (*Lai*

*frame element, li.*

, *ai* , ∧*ai* , **0***ai* , **1***ai* , ¬*ai*)*. • A* De Morgan frame tuple*, s* = {*a*<sup>1</sup> : *l*1, ..., *an* : *ln*}*, maps an attribute name, ai, to a De Morgan*

*tuples matching* D *over U is denoted* D(*U*)*-Tup.*

**Definition 4.2** (Relational algebra with similarities Hajdinjak & Bierman (2011))**.** *The operations of the* relational algebra with similarities*, RA*D*, are defined as follows:*

**Empty relation:** *For any set of attributes U and corresponding De Morgan frame schema,* D*, the empty* D*-relation over U,* ∅*U, is defined such that*

$$\mathcal{Q}\_{\rm II}(t)(a) \stackrel{\text{def}}{=} \mathbf{0} a \tag{54}$$

*where t is a U-tuple and* D(*a*)=(*La*, � *<sup>a</sup>*, ∧*a*, **0***a*, **1***a*, ¬*a*)*.* **Union:** *If A*, *B*: *U-Tup* → D(*U*)*-Tup, then A* ∪ *B*: *U-Tup* → D(*U*)*-Tup is defined by*

$$(A \cup B)(t)(a) \stackrel{\text{def}}{=} A(t)(a) \lor\_{a} B(t)(a) \tag{55}$$

*where* D(*a*)=(*La*, � *<sup>a</sup>*, ∧*a*, **0***a*, **1***a*, ¬*a*)*.*

**Projection:** *If A*: *U-Tup* → D(*U*)*-Tup and V* ⊂ *U, the projection of A on attributes V is defined by*

$$A(\pi\_V A)(t)(a) \stackrel{\text{def}}{=} \bigvee\_{(t' \downarrow V) = t} A(t')(a) \neq \mathbf{0}\_4 A(t')(a) \tag{56}$$

*where* D(*a*)=(*La*, � *<sup>a</sup>*, ∧*a*, **0***a*, **1***a*, ¬*a*)*.*

**Selection:** *If A*: *U-Tup* → D(*U*)*-Tup and the selection predicate* **P**: *U-Tup* → D(*U*)*-Tup maps each U-tuple to an element of* D(*U*)*-Tup, then σ***P***A*: *U-Tup* → D(*U*)*-Tup is defined by*

$$(\sigma \mathbf{p} \, A)(t)(a) \stackrel{\text{def}}{=} A(t)(a) \land\_a \mathbf{P}(t)(a) \tag{57}$$

*where* D(*a*)=(*La*, � *<sup>a</sup>*, ∧*a*, **0***a*, **1***a*, ¬*a*)*.*

**Join:** *Let* D<sup>1</sup> = {*a*<sup>1</sup> : L1, ..., *an* : L*n*} *and* D<sup>2</sup> = {*b*<sup>1</sup> : L� 1, ..., *bm* : L� *<sup>m</sup>*} *be De Morgan frame schemata. Let their union,* D<sup>1</sup> ∪ D2*, contain an attribute, ci* : L*i, as soon as ci* : L*<sup>i</sup> is in* D<sup>1</sup> *or* D<sup>2</sup> *or both. (If there is an attribute with different corresponding De Morgan frames in* D<sup>1</sup> *and* D2*, a renaming of attributes is needed.) If A*: *U*1*-Tup* → D1(*U*1)*-Tup and B*: *U*2*-Tup* → D2(*U*2)*-Tup, then A B is the* (D<sup>1</sup> ∪ D2)*-relation over U*<sup>1</sup> ∪ *U*<sup>2</sup> *defined as follows.*

$$(A \ltimes B)(t)(a) \stackrel{\text{def}}{=} \begin{cases} A(t \downarrow \mathcal{U}\_1)(a) & \text{if } a \in \mathcal{U}\_1 - \mathcal{U}\_2 \\ B(t \downarrow \mathcal{U}\_2)(a) & \text{if } a \in \mathcal{U}\_2 - \mathcal{U}\_1 \ . \end{cases} \tag{58}$$

**Difference:** *If A*, *B*: *U-Tup* → D(*U*)*-Tup, then A* \ *B*: *U-Tup* → D(*U*)*-Tup is defined by*

$$(A \backslash B)(t)(a) \stackrel{\text{def}}{=} A(t)(a) \land\_a (\neg\_a B(t)(a))\tag{59}$$

*where* D(*a*)=(*La*, � *<sup>a</sup>*, ∧*a*, **0***a*, **1***a*, ¬*a*)*.*

**Renaming:** *If A*: *U-Tup* → D(*U*)*-Tup and β* : *U* → *U*� *is a bijection, then ρβA*: *U*� *-Tup* → D(*U*� )*-Tup is defined by*

$$(\rho\_{\beta}A)(t)(a) \stackrel{\text{def}}{=} A(t)(\beta(a)). \tag{60}$$

As in the case of L-relations it is required that every tuple outside of a similarity database is ranked with the minimal De Morgan frame tuple, {*a*<sup>1</sup> : **0**1,..., *an* : **0***n*}, and every other tuple is ranked either with the maximal De Morgan frame tuple, {*a*<sup>1</sup> : **1**1,..., *an* : **1***n*}, or a smaller De Morgan frame tuple expressing a lower degree of containment of the tuple in the database.

*In words,* [*ai* like *aj*] *measures similarity of attributes ai and aj, each with its own similarity measure.*

K-Relations and Beyond 33

Given the similarity measures associated with attributes, it is possible to define similarity-based variants of other familiar relational operators, such as similarity-based joins Hajdinjak & Bierman (2011). Such an operator joins two rows not only when their

In this section we explore whether there is a common domain of annotations suitable for all

We have recalled two notions of difference on annotated relations: the monus-based difference proposed by Geerts and Poggi Geerts & Poggi (2010) and the negation-based difference proposed by Hajdinjak and Bierman Hajdinjak & Bierman (2011). We have seen in §3.3 that the monus-based difference does not have the qualities expected in a fuzzy context. The negation-based difference, on the other hand, does agree with the standard fuzzy difference, but it is not defined for bag semantics (and provenance). More precisely, the semiring of counting numbers, K**<sup>N</sup>** = (**N**, +, ·, 0, 1), cannot be extended with a negation operation. (The

We could try to modify the semiring of counting numbers in such a way that negation can be defined. For instance, if we replace **N** by **Z**, we get the ring of integers, (**Z**, +, ·, 0, 1),

which is not equal to the standard difference of relations annotated with the tuples' multiplicities Montagna & Sebastiani (2001). Some other modifications would give the so called *tropical semirings* Aceto et al. (2001) whose underlying carrier set is some subset of the set of real numbers **R** equipped with binary operations of minimum or maximum as sum, and

**Proposition 5.1** (Identities in an *m*-semiring Bosbach (1965))**.** *The notion of an m-semiring is characterized by the properties of commutative semirings and the following identities involving* �*.*

def

Let us now study the properties of the annotation structures of both approaches.

kinds of annotated relations, and we define a general model of K, L-and D-relations.

Now union and intersection of selection predicates are computed component-wise.

join-attributes have equal associated values, but when the values are similar.

def = [*ai* like *aj*] <sup>∪</sup> [*aj* like *ai*]. (66)

= −*x*. This implies (*A* \ *B*)(*t*) = −*A*(*t*) · *B*(*t*),

*x* � *x* = **0**, (67) **0** � *x* = **0**, (68) *x* ⊕ (*y* � *x*) = *y* ⊕ (*x* � *y*), (69) *x* � (*y* ⊕ *z*)=(*x* � *y*) � *z*, (70) *x* � (*y* � *z*)=(*x* � *y*) � (*x* � *z*). (71)

*The symmetric version is defined as follows.*

**5. A common framework**

**5.1 A common annotation domain**

same holds for the provenance semiring.)

where negation can be defined as ¬*x*

addition as product.

[*ai* ∼ *aj*]

**Proposition 4.1** (Identities of D-relations Hajdinjak & Bierman (2011))**.** *The following identities hold for the relational algebra on* D*-relations:*


Each of the similarity measures associated with the attributes maps to its own De Morgan frame. Again, a predefined environment of similarity measures that can be used for building queries is assumed—for every D-relation over *U*, where D = {*a*<sup>1</sup> : L1, ..., *an* : L*n*} and L*<sup>i</sup>* = (*Li*, � *<sup>i</sup>*, ∧*i*, **0***i*, **1***i*, ¬*i*) and *U* = {*a*<sup>1</sup> : *τ*1, ..., *an* : *τn*} there is a similarity measure

$$\rho\_{a\_l} \colon \pi\_i \times \pi\_i \to L\_{i\prime} \mathbf{1} \le i \le n. \tag{61}$$

In the D-relation model, primitive and similarity predicates need to be redefined.

**Definition 4.3** (Primitive predicates Hajdinjak & Bierman (2011))**.** *Suppose in a schema U* = {*a*<sup>1</sup> : *τ*1,..., *an* : *τn*} *the types of attributes ai and aj coincide. Then for a given binary predicate θ, the* primitive predicate

$$\vdash [a\_i \,\theta \, a\_j] \colon \mathsf{U}\text{-Tup} \to \mathcal{D}(\mathsf{U}\mathsf{I})\text{-Tup} \tag{62}$$

*is defined as follows.*

$$[(a\_l \theta \, a\_j)(t) \, (a\_k) \stackrel{\text{def}}{=} \begin{cases} \chi\_{a\_l \theta a\_j}(t) & \text{if } k=i \text{ or } k=j, \\ \mathbf{1}\_k & \text{otherwise.} \end{cases} \tag{63}$$

In words, [*ai θ aj*] has value **1** in every attribute except *ai* and *aj*, where it behaves as the characteristic map of *θ* defined as follows.

$$\chi\_{a\_{\!\!\!\!= \!0\_{\langle\rangle}}}(t) \stackrel{\text{def}}{=} \begin{cases} \mathbf{1}\_{k} & \text{if } t(a\_{\!\!\!=}) \,\theta \, t(a\_{\!\!\!=}),\\ \mathbf{0}\_{k} & \text{otherwise.} \end{cases} \tag{64}$$

Similarity predicates annotate tuples based on the similarity measures.

**Definition 4.4** (Similarity predicates Hajdinjak & Bierman (2011))**.** *Suppose in a schema U* = {*a*<sup>1</sup> : *τ*1,..., *an* : *τn*} *the types of attributes ai and aj coincide. The similarity predicate* [*ai* like *aj*] : *U-Tup* → D(*U*)*-Tup is defined as follows.*

$$[(a\_l \text{ 1} \text{ 1} \text{ke } a\_j](t)(a\_k) \stackrel{\text{def}}{=} \begin{cases} \rho\_{a\_l}(t(a\_i), t(a\_j)) & \text{if } a\_k = a\_i, \\ \rho\_{a\_l}(t(a\_i), t(a\_j)) & \text{if } a\_k = a\_j, \\ \mathbf{1}\_k & \text{otherwise.} \end{cases} \tag{65}$$

14 Will-be-set-by-IN-TECH

**Proposition 4.1** (Identities of D-relations Hajdinjak & Bierman (2011))**.** *The following identities*

*• selection with boolean predicates gives all or nothing, σ*false(*A*) = ∅ *and σ*true(*A*) = *A, where*

*• join with an empty relation gives an empty relation, A* ∅*<sup>U</sup>* = ∅*<sup>U</sup> where A is a* D*-relation over*

Each of the similarity measures associated with the attributes maps to its own De Morgan frame. Again, a predefined environment of similarity measures that can be used for building queries is assumed—for every D-relation over *U*, where D = {*a*<sup>1</sup> : L1, ..., *an* : L*n*} and L*<sup>i</sup>* =

**Definition 4.3** (Primitive predicates Hajdinjak & Bierman (2011))**.** *Suppose in a schema U* = {*a*<sup>1</sup> : *τ*1,..., *an* : *τn*} *the types of attributes ai and aj coincide. Then for a given binary predicate θ, the*

In words, [*ai θ aj*] has value **1** in every attribute except *ai* and *aj*, where it behaves as the

**Definition 4.4** (Similarity predicates Hajdinjak & Bierman (2011))**.** *Suppose in a schema U* = {*a*<sup>1</sup> : *τ*1,..., *an* : *τn*} *the types of attributes ai and aj coincide. The similarity predicate* [*ai* like *aj*] :

**1***<sup>k</sup>* if *t*(*ai*) *θ t*(*aj*),

*<sup>i</sup>*, ∧*i*, **0***i*, **1***i*, ¬*i*) and *U* = {*a*<sup>1</sup> : *τ*1, ..., *an* : *τn*} there is a similarity measure

In the D-relation model, primitive and similarity predicates need to be redefined.

= � *χaiθaj*

(*t*) def = �

=

⎧ ⎪⎨

⎪⎩

�

*<sup>a</sup>*, ∧*a*, **0***a*, **1***a*, ¬*a*)*;*

*ρai* : *τ<sup>i</sup>* × *τ<sup>i</sup>* → *Li*, 1 ≤ *i* ≤ *n*. (61)

[*ai θ aj*]: *U-Tup* → D(*U*)*-Tup* (62)

**<sup>1</sup>***<sup>k</sup> otherwise.* (63)

**<sup>0</sup>***<sup>k</sup>* otherwise. (64)

(65)

(*t*) *if k* = *i or k* = *j,*

*ρai*(*t*(*ai*), *t*(*aj*)) *if ak* = *ai, ρaj*(*t*(*ai*), *t*(*aj*)) *if ak* = *aj,* **1***<sup>k</sup> otherwise.*

*hold for the relational algebra on* D*-relations:*

*• projection distributes over union and join;*

*a schema U;*

primitive predicate

*is defined as follows.*

(*Li*, �

*• selection distributes over union and difference;*

*• selections and projections commute with each other;*

*• union is associative, commutative, idempotent, and has identity* ∅*;*

*• difference has identity* ∅ *and distributes over union and intersection;*

*• projection of an empty relation gives an empty relation, πV*(∅) = ∅*.*

[*ai <sup>θ</sup> aj*](*t*)(*ak*) def

*χaiθaj*

[*ai* like *aj*](*t*)(*ak*) def

Similarity predicates annotate tuples based on the similarity measures.

characteristic map of *θ* defined as follows.

*U-Tup* → D(*U*)*-Tup is defined as follows.*

*• join is associative and commutative, and distributes over union;*

false(t)(a) = **0**<sup>a</sup> *and* true(t)(a) = **1**<sup>a</sup> *for* D(*a*)=(*La*,

*In words,* [*ai* like *aj*] *measures similarity of attributes ai and aj, each with its own similarity measure. The symmetric version is defined as follows.*

$$[a\_i \sim a\_j] \stackrel{\text{def}}{=} [a\_i \text{ 1\"{.} \mathbf{k} \mathbf{e} \ a\_j] \cup [a\_j \text{ 1\"{.} } \mathbf{k} \mathbf{e} \ a\_i] . \tag{66}$$

Now union and intersection of selection predicates are computed component-wise.

Given the similarity measures associated with attributes, it is possible to define similarity-based variants of other familiar relational operators, such as similarity-based joins Hajdinjak & Bierman (2011). Such an operator joins two rows not only when their join-attributes have equal associated values, but when the values are similar.

#### **5. A common framework**

In this section we explore whether there is a common domain of annotations suitable for all kinds of annotated relations, and we define a general model of K, L-and D-relations.

#### **5.1 A common annotation domain**

We have recalled two notions of difference on annotated relations: the monus-based difference proposed by Geerts and Poggi Geerts & Poggi (2010) and the negation-based difference proposed by Hajdinjak and Bierman Hajdinjak & Bierman (2011). We have seen in §3.3 that the monus-based difference does not have the qualities expected in a fuzzy context. The negation-based difference, on the other hand, does agree with the standard fuzzy difference, but it is not defined for bag semantics (and provenance). More precisely, the semiring of counting numbers, K**<sup>N</sup>** = (**N**, +, ·, 0, 1), cannot be extended with a negation operation. (The same holds for the provenance semiring.)

We could try to modify the semiring of counting numbers in such a way that negation can be defined. For instance, if we replace **N** by **Z**, we get the ring of integers, (**Z**, +, ·, 0, 1), where negation can be defined as ¬*x* def = −*x*. This implies (*A* \ *B*)(*t*) = −*A*(*t*) · *B*(*t*), which is not equal to the standard difference of relations annotated with the tuples' multiplicities Montagna & Sebastiani (2001). Some other modifications would give the so called *tropical semirings* Aceto et al. (2001) whose underlying carrier set is some subset of the set of real numbers **R** equipped with binary operations of minimum or maximum as sum, and addition as product.

Let us now study the properties of the annotation structures of both approaches.

**Proposition 5.1** (Identities in an *m*-semiring Bosbach (1965))**.** *The notion of an m-semiring is characterized by the properties of commutative semirings and the following identities involving* �*.*

$$
\mathfrak{x} \ominus \mathfrak{x} = \mathfrak{0},
\tag{67}
$$

$$\mathbf{0} \ominus \mathfrak{x} = \mathbf{0}, \tag{68}$$


*1. The semiring of counting numbers,* K**<sup>N</sup>** = (**N**, +, ·, 0, 1)*, partially ordered by*

*2. The provenance semiring,* K*prov* = (**N**[*X*], +, ·, 0, 1)*, partially ordered by*

*<sup>f</sup>* [*X*] ∧ *<sup>g</sup>*[*X*] = ∑

*<sup>f</sup>* [*X*] ∨ *<sup>g</sup>*[*X*] = ∑

*lattice structure,* (**N** ∪ {∞}, max, min, 0, ∞)*.*

(**N** ∪ {∞})[*X*] *with*

**5.2 A common model**

(*L*,

*with all coefficients equal to* ∞*.*

probabilistic relations, and relations on *c*-tables).

*may be extended to the partially ordered commutative semiring* (**N** ∪ {∞}, +, ·, 0, 1) *by defining* ∞ + *n* = ∞ *and* ∞ · *n* = ∞ *except* ∞ · 0 = 0*. The partial order* � *now determines a complete*

K-Relations and Beyond 35

*where f* [*X*] = <sup>∑</sup>*α*∈*<sup>I</sup> <sup>f</sup>αx<sup>α</sup> and g*[*X*] = <sup>∑</sup>*α*∈*<sup>I</sup> <sup>g</sup>αxα, may be extended to the commutative semiring* ((**<sup>N</sup>** ∪ {∞})[*X*], <sup>+</sup>, ·, 0, 1) *by defining x*<sup>∞</sup> · *<sup>x</sup><sup>n</sup>* <sup>=</sup> *<sup>x</sup>*<sup>∞</sup> *as well as* <sup>∞</sup> <sup>+</sup> *<sup>n</sup>* <sup>=</sup> <sup>∞</sup> *and* <sup>∞</sup> · *<sup>n</sup>* <sup>=</sup> <sup>∞</sup> *except* ∞ · 0 = 0 *as before. The partial order* � *now determines a complete lattice structure on*

*α*∈*I*

*α*∈*I*

*The least element of the lattice is the zero polynomial,* 0*, and the greatest element is the polynomial*

To summarize, a complete distributive lattice is an *m*-semiring. If the lattice even contains negation, we have two difference-like operations; monus � and ÷, which is induced by negation. There is a class of annotated relations when only one of them (� for bag semantics and provenance, ÷ for fuzzy semantics) gives the standard notion of relational difference, and there is a class of annotated relations when they both coincide (e.g., classical set semantics,

**Proposition 5.3** (General annotation structure)**.** *Complete distributive lattices with finite meets*

*Proof.* The boolean semiring, the probabilistic semiring, the semiring on *c*-tables, the similarity semirings as well as the semiring of counting numbers and the provenance semiring (see Lemma 5.1) can all be extended to a complete distributive lattice in which finite meets distribute over arbitrary joins. The later property allows to model infinite relations satisfying all the desired relational identities from Proposition 4.1, including commuting selections and projections. Relational difference may be modeled with the existing monus, �, or ÷ if the lattice is a De Morgan frame where a negation exists. The other (positive) relational operations

Recall that Green et al. Green et al. (2007) defined a K-relation over *U* = {*a*<sup>1</sup> : *τ*1,..., *an* : *τn*} as a function *A*: *U*-Tup → *K* with finite support. The finite-support requirement was made to ensure the existence of the sum in the definition of relational projection. When the commutative semiring K = (*K*, ⊕, �, **0**, **1**) was replaced by a De Morgan frame, L =

, ∧, **<sup>0</sup>**, **<sup>1</sup>**, ¬), the finite-support requirement became unnecessary; the existence of the join in the definition of projection was quaranteed by the completeness of the codomain.

*distributing over arbitrary joins are suitable codomains for all considered annotated relations.*

are modeled using lattice meet, ∧, and join, ∨, or semiring sum, ⊕, and product, �.

*n* � *m* ⇐⇒ *n* ≤ *m*, (84)

min{ *<sup>f</sup>α*, *<sup>g</sup>α*}*xα*, (86)

max{ *<sup>f</sup>α*, *<sup>g</sup>α*}*xα*. (87)

*f* [*X*] � *g*[*X*] ⇐⇒ *f<sup>α</sup>* ≤ *g<sup>α</sup> for all α* ∈ *I*, (85)

Notice that even in a De Morgan frame a difference-like operation may be defined,

$$\mathfrak{x} \div \mathfrak{y} \stackrel{\text{def}}{=} \mathfrak{x} \wedge \neg \mathfrak{y}.\tag{72}$$

Clearly, negation is then expressed as ¬*x* = 1 ÷ *x*.

**Proposition 5.2** (Identities in a De Morgan frame)**.** *In a De Morgan frame the following identities involving* ÷ *hold.*

$$\mathbf{1} \div \mathbf{0} = \mathbf{1},\tag{73}$$

$$\mathbf{1} \div \mathbf{1} = \mathbf{0},\tag{74}$$

$$\mathbf{1} \div (\mathbf{x} \lor y) = (\mathbf{1} \div \mathbf{x}) \land (\mathbf{1} \div y), \tag{75}$$

$$\mathbf{1} \div (\mathbf{x} \wedge \mathbf{y}) = (\mathbf{1} \div \mathbf{x}) \vee (\mathbf{1} \div \mathbf{y}),\tag{76}$$

$$\mathbf{0} \div \mathbf{x} = \mathbf{0},\tag{77}$$

$$\mathbf{1} \div (\mathbf{1} \div \mathbf{x}) = \mathbf{x}, \tag{78}$$

$$\mathbf{x} \div (\mathbf{1} \div \mathbf{y}) = \mathbf{x} \wedge \mathbf{y},\tag{79}$$

$$\mathbf{1} \div (\mathbf{1} \div (\mathbf{x} \lor y)) = \mathbf{x} \lor y,\tag{80}$$

$$\mathbf{1} \div (\mathbf{x} \div y) = (\mathbf{1} \div \mathbf{x}) \lor y,\tag{81}$$

$$(\mathfrak{x}\div\mathfrak{y})\wedge\mathfrak{y}=\mathfrak{x}\wedge(\mathfrak{y}\div\mathfrak{y})\_{\prime}\tag{82}$$

$$(\mathbf{x} \div \mathbf{y}) \lor y = (\mathbf{x} \lor y) \land (\mathbf{1} \div (y \div y)). \tag{83}$$

*Proof.* The first four identities are exactly the De Morgan laws from Proposition 3.1. The rest holds by simple expansion of definitions and/or is implied by the De Morgan laws.

Notice the differences between the properties of the monus-based difference � in an *m*-semiring and the properties of the negation-based difference ÷ in a De Morgan frame. For instance, in a De Morgan frame we do *not* have *x* ÷ *x* = **0** in general.

However, since neither of the proposed notions of difference give the expected result for all kinds of annotated relations, an annotation structure different from *m*-semirings and De Morgan frames is needed. Observe that by its definition, a complete (even bounded) distributive lattice, L = (*L*, ∨, ∧, **0**, **1**), is a commutative semiring with the natural order � being the lattice order, *a* ⊕ *b* = *a* ∨ *b* and *a* � *b* = *a* ∧ *b* for every *a*, *b* in *L*. Because lattice completeness assures the existence of a smallest element in every set and hence the existence of the monus (see Definition 2.3 on GP-conditions), a complete distributive lattice is an *m*-semiring. On the other hand, if a commutative semiring, K = (*K*, ⊕, �, **0**, **1**), is partially ordered by � and any two elements from *K* have an infimum and a supremum, it is a lattice, not necessarily bounded Davey & Priestley (1990). The lattice meet and join are then determined by the partial order �, and they are, in general, different from ⊕ and �. Since **0** ⊕ *a* = *a*, we have **0** � *a* for any *a* ∈ *K*, and **0** is the least element of the lattice. In general, a similar observation does not hold for **1**, which is hence not the greatest element of the lattice.

The underlying carrier sets of all the semirings considered are partially ordered sets, even distributive lattices. The unbounded lattices among them (i.e., K**<sup>N</sup>** and Kprov) can be converted into bounded (even complete) lattices by adding a greatest element. To achieve this we just need to replace **N** ∪ {∞} for **N** and define appropriate calculation rules for ∞.

**Lemma 5.1** (Making unbounded partially ordered semirings bounded)**.**

16 Will-be-set-by-IN-TECH

**Proposition 5.2** (Identities in a De Morgan frame)**.** *In a De Morgan frame the following identities*

*Proof.* The first four identities are exactly the De Morgan laws from Proposition 3.1. The rest

Notice the differences between the properties of the monus-based difference � in an *m*-semiring and the properties of the negation-based difference ÷ in a De Morgan frame. For

However, since neither of the proposed notions of difference give the expected result for all kinds of annotated relations, an annotation structure different from *m*-semirings and De Morgan frames is needed. Observe that by its definition, a complete (even bounded) distributive lattice, L = (*L*, ∨, ∧, **0**, **1**), is a commutative semiring with the natural order � being the lattice order, *a* ⊕ *b* = *a* ∨ *b* and *a* � *b* = *a* ∧ *b* for every *a*, *b* in *L*. Because lattice completeness assures the existence of a smallest element in every set and hence the existence of the monus (see Definition 2.3 on GP-conditions), a complete distributive lattice is an *m*-semiring. On the other hand, if a commutative semiring, K = (*K*, ⊕, �, **0**, **1**), is partially ordered by � and any two elements from *K* have an infimum and a supremum, it is a lattice, not necessarily bounded Davey & Priestley (1990). The lattice meet and join are then determined by the partial order �, and they are, in general, different from ⊕ and �. Since **0** ⊕ *a* = *a*, we have **0** � *a* for any *a* ∈ *K*, and **0** is the least element of the lattice. In general, a similar observation does not hold for **1**, which is hence not the greatest element of the lattice. The underlying carrier sets of all the semirings considered are partially ordered sets, even distributive lattices. The unbounded lattices among them (i.e., K**<sup>N</sup>** and Kprov) can be converted into bounded (even complete) lattices by adding a greatest element. To achieve this we just need to replace **N** ∪ {∞} for **N** and define appropriate calculation rules for ∞.

holds by simple expansion of definitions and/or is implied by the De Morgan laws.

instance, in a De Morgan frame we do *not* have *x* ÷ *x* = **0** in general.

**Lemma 5.1** (Making unbounded partially ordered semirings bounded)**.**

= *x* ∧ ¬*y*. (72)

 ÷ **0** = **1**, (73) ÷ **1** = **0**, (74) ÷ (*x* ∨ *y*)=(**1** ÷ *x*) ∧ (**1** ÷ *y*), (75) ÷ (*x* ∧ *y*)=(**1** ÷ *x*) ∨ (**1** ÷ *y*), (76) ÷ *x* = **0**, (77) ÷ (**1** ÷ *x*) = *x*, (78) *x* ÷ (**1** ÷ *y*) = *x* ∧ *y*, (79) ÷ (**1** ÷ (*x* ∨ *y*)) = *x* ∨ *y*, (80) ÷ (*x* ÷ *y*)=(**1** ÷ *x*) ∨ *y*, (81) (*x* ÷ *y*) ∧ *y* = *x* ∧ (*y* ÷ *y*), (82) (*x* ÷ *y*) ∨ *y* = (*x* ∨ *y*) ∧ (**1** ÷ (*y* ÷ *y*)). (83)

Notice that even in a De Morgan frame a difference-like operation may be defined,

*x* ÷ *y* def

Clearly, negation is then expressed as ¬*x* = 1 ÷ *x*.

*involving* ÷ *hold.*

*1. The semiring of counting numbers,* K**<sup>N</sup>** = (**N**, +, ·, 0, 1)*, partially ordered by*

$$m \preceq m \iff m \le m\_\prime \tag{84}$$

*may be extended to the partially ordered commutative semiring* (**N** ∪ {∞}, +, ·, 0, 1) *by defining* ∞ + *n* = ∞ *and* ∞ · *n* = ∞ *except* ∞ · 0 = 0*. The partial order* � *now determines a complete lattice structure,* (**N** ∪ {∞}, max, min, 0, ∞)*.*

*2. The provenance semiring,* K*prov* = (**N**[*X*], +, ·, 0, 1)*, partially ordered by*

$$f[X] \preceq \operatorname{g}[X] \iff f\_{\mathfrak{a}} \le \operatorname{g}\_{\mathfrak{a}} \text{ for all } \mathfrak{a} \in I,\tag{85}$$

*where f* [*X*] = <sup>∑</sup>*α*∈*<sup>I</sup> <sup>f</sup>αx<sup>α</sup> and g*[*X*] = <sup>∑</sup>*α*∈*<sup>I</sup> <sup>g</sup>αxα, may be extended to the commutative semiring* ((**<sup>N</sup>** ∪ {∞})[*X*], <sup>+</sup>, ·, 0, 1) *by defining x*<sup>∞</sup> · *<sup>x</sup><sup>n</sup>* <sup>=</sup> *<sup>x</sup>*<sup>∞</sup> *as well as* <sup>∞</sup> <sup>+</sup> *<sup>n</sup>* <sup>=</sup> <sup>∞</sup> *and* <sup>∞</sup> · *<sup>n</sup>* <sup>=</sup> <sup>∞</sup> *except* ∞ · 0 = 0 *as before. The partial order* � *now determines a complete lattice structure on* (**N** ∪ {∞})[*X*] *with*

$$f[X] \wedge g[X] = \sum\_{\alpha \in I} \min \{ f\_{\alpha \nu} g\_{\alpha} \} x^{\alpha} \,, \tag{86}$$

$$f[X] \vee g[X] = \sum\_{a \in I} \max\{f\_{a \prime} g\_a\} \mathbf{x}^a. \tag{87}$$

*The least element of the lattice is the zero polynomial,* 0*, and the greatest element is the polynomial with all coefficients equal to* ∞*.*

To summarize, a complete distributive lattice is an *m*-semiring. If the lattice even contains negation, we have two difference-like operations; monus � and ÷, which is induced by negation. There is a class of annotated relations when only one of them (� for bag semantics and provenance, ÷ for fuzzy semantics) gives the standard notion of relational difference, and there is a class of annotated relations when they both coincide (e.g., classical set semantics, probabilistic relations, and relations on *c*-tables).

**Proposition 5.3** (General annotation structure)**.** *Complete distributive lattices with finite meets distributing over arbitrary joins are suitable codomains for all considered annotated relations.*

*Proof.* The boolean semiring, the probabilistic semiring, the semiring on *c*-tables, the similarity semirings as well as the semiring of counting numbers and the provenance semiring (see Lemma 5.1) can all be extended to a complete distributive lattice in which finite meets distribute over arbitrary joins. The later property allows to model infinite relations satisfying all the desired relational identities from Proposition 4.1, including commuting selections and projections. Relational difference may be modeled with the existing monus, �, or ÷ if the lattice is a De Morgan frame where a negation exists. The other (positive) relational operations are modeled using lattice meet, ∧, and join, ∨, or semiring sum, ⊕, and product, �.

#### **5.2 A common model**

Recall that Green et al. Green et al. (2007) defined a K-relation over *U* = {*a*<sup>1</sup> : *τ*1,..., *an* : *τn*} as a function *A*: *U*-Tup → *K* with finite support. The finite-support requirement was made to ensure the existence of the sum in the definition of relational projection. When the commutative semiring K = (*K*, ⊕, �, **0**, **1**) was replaced by a De Morgan frame, L = (*L*, , ∧, **<sup>0</sup>**, **<sup>1</sup>**, ¬), the finite-support requirement became unnecessary; the existence of the join in the definition of projection was quaranteed by the completeness of the codomain.

**Union:** *If A*, *B*: *U-Tup* → C(*U*)*-Tup, then A* ∪ *B*: *U-Tup* → C(*U*)*-Tup is defined by*

*U-tuple to an element of* C(*U*)*-Tup, then σ***P***A*: *U-Tup* → C(*U*)*-Tup is defined by*

**Difference:** *If A*, *B*: *U-Tup* → C(*U*)*-Tup, then A* \ *B*: *U-Tup* → C(*U*)*-Tup is defined by*

**Renaming:** *If A*: *U-Tup* → C(*U*)*-Tup and β* : *U* → *U*� *is a bijection, then ρβA*: *U*�

(*ρβA*)(*t*)(*a*) def

Relational algebra RA<sup>C</sup> still satisfies *all* the main positive relational algebra identities.

**Proposition 5.5** (Identities of C-relations)**.** *The following identities hold for the relational algebra*

*• selection with boolean predicates gives all or nothing, σ*false(*A*) = ∅ *and σ*true(*A*) = *A, where*

*• join with an empty relation gives an empty relation, A* ∅*<sup>U</sup>* = ∅*<sup>U</sup> where A is a* C*-relation over*

�

*<sup>a</sup>*, ∧*a*, **0***a*, **1***a*)*;*

(*<sup>A</sup>* \ *<sup>B</sup>*)(*t*)(*a*) def

(*σ***P***A*)(*t*)(*a*) def

**Projection:** *If A*: *U-Tup* → C(*U*)*-Tup and V* ⊂ *U, the projection of A on attributes V is defined by*

K-Relations and Beyond 37

**Selection:** *If A*: *U-Tup* → C(*U*)*-Tup and the selection predicate* **P**: *U-Tup* → C(*U*)*-Tup maps each*

*A*: *U*1*-Tup* → C1(*U*1)*-Tup and B*: *U*2*-Tup* → C2(*U*2)*-Tup, then A B is the* (C<sup>1</sup> ∪ C2)*-relation*

*A*(*t* ↓ *U*1)(*a*) *<sup>a</sup> B*(*t* ↓ *U*2)(*a*)

<sup>=</sup> �(*t*�↓*V*)=*t and A*(*t*�)(*a*)�=**0***<sup>a</sup> <sup>A</sup>*(*<sup>t</sup>*

= *A*(*t*)(*a*)*aB*(*t*)(*a*). (90)

�

= *A*(*t*)(*a*) *<sup>a</sup>* **P**(*t*)(*a*). (92)

= *A*(*t*)(*a*) −*<sup>a</sup> B*(*t*)(*a*). (94)

= *A*(*t*)(*β*(*a*)). (95)

1, ..., *bm* : L�

*A*(*t* ↓ *U*1)(*a*) *if a* ∈ *U*<sup>1</sup> − *U*<sup>2</sup> *B*(*t* ↓ *U*2)(*a*) *if a* ∈ *U*<sup>2</sup> − *U*<sup>1</sup>

)(*a*). (91)

*<sup>m</sup>*} *be annotation schemata. If*

. (93)

*-Tup* →

(*<sup>A</sup>* <sup>∪</sup> *<sup>B</sup>*)(*t*)(*a*) def

(*π<sup>V</sup> <sup>A</sup>*)(*t*)(*a*) def

**Join:** *Let* C<sup>1</sup> = {*a*<sup>1</sup> : L1, ..., *an* : L*n*} *and* C<sup>2</sup> = {*b*<sup>1</sup> : L�

=

⎧ ⎪⎨

⎪⎩

*• union is associative, commutative, idempotent, and has identity* ∅*;*

*• difference has identity* ∅ *and distributes over union and intersection;*

*• projection of an empty relation gives an empty relation, πV*(∅) = ∅*.*

*• join is associative and commutative, and distributes over union;*

*• selection distributes over union and difference;*

*• selections and projections commute with each other;*

false(t)(a) = **0**<sup>a</sup> *and* true(t)(a) = **1**<sup>a</sup> *for* C(*a*)=(*La*,

*• projection distributes over union and join;*

(*<sup>A</sup> <sup>B</sup>*)(*t*)(*a*) def

*over U*<sup>1</sup> ∪ *U*<sup>2</sup> *defined as follows.*

)*-Tup is defined by*

C(*U*�

*on* C*-relations:*

*a schema U;*

To model similarity relations more efficiently, Hajdinjak and Bierman Hajdinjak & Bierman (2011) introduced a D-relation over *U* as a function from *U*-Tup to D(*U*)-Tup assigning every element of *U*-Tup (row of a table) a tuple of different annotation values. We adopt Definition 4.1 to the proposed general annotation structure, and show that a tuple-annotated model may be injectively mapped to an attribute-annotated model.

**Definition 5.1** (Annotation schema,annotation tuple,C-relation)**.**


**Proposition 5.4** (Injection of a tuple-annotated model to an attribute-annotated model)**.** *Let* A *be the class of all functions A*: *U-Tup* → *L where U is any relational schema and* L = (*L*, , ∧, **<sup>0</sup>**, **<sup>1</sup>**) *is any complete distributive lattice with finite meets distributing over arbitrary joins. Let* B *be the class of all* C*-relations over U, B*: *U-Tup* → C(*U*)*-Tup, where* C *is an annotation schema. There is an injective function F* : A→B *defined by*

$$F(A)(t)(a\_i) \stackrel{\text{def}}{=} A(t) \tag{88}$$

*for all attributes ai in U and tuples t* ∈ *U-Tup.*

*Proof.* For *A*1, *A*<sup>2</sup> ∈ A with *A*1(*t*) �= *A*2(*t*) we clearly have *F*(*A*1)(*t*)(*ai*) �= *F*(*A*2)(*t*)(*ai*).

Proposition 5.4 says that moving from tuple-annotated relations to attribute-annotated relations does not prevent us from correctly modeling the examples covered by the K-relation model in which each tuple is annotated with a single value from K. The annotation value just appears several times. We thus propose a model of C-relations, a common model of K, L-and D-relations, that is attribute annotated. The definitions of union, projection, selection, and join of C-relations may be based on the lattice join and meet operations (like in Definitions 3.9 and 4.2) or, if there exist semiring sum and product operations different from lattice join and meet, the positive relational operations may be defined using these additional semiring operations (like in Definition 2.2). The definition of relational difference may be based on the monus or, when dealing with De morgan frames where a negation exists, the derived ÷ operation.

**Definition 5.2** (Relational algebra on C-relations)**.** *Consider* C*-relations where all the lattices* L*<sup>i</sup>* = (*Lai* , *ai* , ∧*ai* , **0***ai* , **1***ai*) *from annotation schema* C = {*a*<sup>1</sup> : L1, ..., *an* : L*n*} *are complete distributive lattices in which finite meets distribute over arbitrary joins. Let* �*ai and ai stand for either the lattice* ∨*ai and* ∧*ai or some other semiring* ⊕*ai and* �*ai operations defined on the carrier set Lai of a* L*i, respectively. Let* −*ai stand for either the monus* �*ai or a* ÷*ai operation defined on Lai . The operations of the relational algebra on* C*, denoted RA*<sup>C</sup> *, are defined as follows.*

**Empty relation:** *For any set of attributes U and corresponding annotation schema,* C*, the empty* C*-relation over U,* ∅*U, is defined by*

$$\mathcal{Q}\_{\rm II}(t)(a) \stackrel{\text{def}}{=} \mathbf{0}\_{a}.\tag{89}$$

18 Will-be-set-by-IN-TECH

To model similarity relations more efficiently, Hajdinjak and Bierman Hajdinjak & Bierman (2011) introduced a D-relation over *U* as a function from *U*-Tup to D(*U*)-Tup assigning every element of *U*-Tup (row of a table) a tuple of different annotation values. We adopt Definition 4.1 to the proposed general annotation structure, and show that a tuple-annotated

*• An* annotation schema*,* C = {*a*<sup>1</sup> : L1, ..., *an* : L*n*}*, over U* = {*a*<sup>1</sup> : *τ*1, ..., *an* : *τn*} *maps an attribute name, ai, to a complete distributive lattice in which finite meets distribute over arbitrary*

*• An* annotation tuple*, s* = {*a*<sup>1</sup> : *l*1, ..., *an* : *ln*}*, maps an attribute name, ai, to an element of a complete distributive lattice in which finite meets distribute over arbitrary joins, li. The set of all*

**Proposition 5.4** (Injection of a tuple-annotated model to an attribute-annotated model)**.** *Let* A

*is any complete distributive lattice with finite meets distributing over arbitrary joins. Let* B *be the class of all* C*-relations over U, B*: *U-Tup* → C(*U*)*-Tup, where* C *is an annotation schema. There is an*

*<sup>F</sup>*(*A*)(*t*)(*ai*) def

*Proof.* For *A*1, *A*<sup>2</sup> ∈ A with *A*1(*t*) �= *A*2(*t*) we clearly have *F*(*A*1)(*t*)(*ai*) �= *F*(*A*2)(*t*)(*ai*).

Proposition 5.4 says that moving from tuple-annotated relations to attribute-annotated relations does not prevent us from correctly modeling the examples covered by the K-relation model in which each tuple is annotated with a single value from K. The annotation value just appears several times. We thus propose a model of C-relations, a common model of K, L-and D-relations, that is attribute annotated. The definitions of union, projection, selection, and join of C-relations may be based on the lattice join and meet operations (like in Definitions 3.9 and 4.2) or, if there exist semiring sum and product operations different from lattice join and meet, the positive relational operations may be defined using these additional semiring operations (like in Definition 2.2). The definition of relational difference may be based on the monus or, when dealing with De morgan frames where a negation exists, the derived ÷

**Definition 5.2** (Relational algebra on C-relations)**.** *Consider* C*-relations where all the lattices* L*<sup>i</sup>* =

*lattices in which finite meets distribute over arbitrary joins. Let* �*ai and ai stand for either the lattice* ∨*ai and* ∧*ai or some other semiring* ⊕*ai and* �*ai operations defined on the carrier set Lai of a* L*i,*

**Empty relation:** *For any set of attributes U and corresponding annotation schema,* C*, the empty*

<sup>∅</sup>*U*(*t*)(*a*) def

*respectively. Let* −*ai stand for either the monus* �*ai or a* ÷*ai operation defined on Lai*

*of the relational algebra on* C*, denoted RA*<sup>C</sup> *, are defined as follows.*

C*-relation over U,* ∅*U, is defined by*

, **1***ai*) *from annotation schema* C = {*a*<sup>1</sup> : L1, ..., *an* : L*n*} *are complete distributive*

, ∧, **<sup>0</sup>**, **<sup>1</sup>**)

*. The operations*

= *A*(*t*) (88)

= **0***a*. (89)

*be the class of all functions A*: *U-Tup* → *L where U is any relational schema and* L = (*L*,

model may be injectively mapped to an attribute-annotated model. **Definition 5.1** (Annotation schema,annotation tuple,C-relation)**.**

annotation tuples *matching* C *over U is denoted* C(*U*)*-Tup. • An* C-relation over *U is a finite map from U-Tup to* C(*U*)*-Tup.*

*joins,* L*<sup>i</sup>* = (*Lai*

operation.

(*Lai* , *ai* , ∧*ai* , **0***ai* , *ai* , ∧*ai* , **0***ai* , **1***ai*)*.*

*injective function F* : A→B *defined by*

*for all attributes ai in U and tuples t* ∈ *U-Tup.*

**Union:** *If A*, *B*: *U-Tup* → C(*U*)*-Tup, then A* ∪ *B*: *U-Tup* → C(*U*)*-Tup is defined by*

$$(A \cup B)(t)(a) \stackrel{\text{def}}{=} A(t)(a) \nabla\_a B(t)(a). \tag{90}$$

**Projection:** *If A*: *U-Tup* → C(*U*)*-Tup and V* ⊂ *U, the projection of A on attributes V is defined by*

$$(\pi\_V A)(t)(a) \stackrel{\text{def}}{=} \bigvee\_{(t' \downarrow V) = t} \text{ and } A(t')(a) \neq \mathbf{0}\_4 \ A(t')(a). \tag{91}$$

**Selection:** *If A*: *U-Tup* → C(*U*)*-Tup and the selection predicate* **P**: *U-Tup* → C(*U*)*-Tup maps each U-tuple to an element of* C(*U*)*-Tup, then σ***P***A*: *U-Tup* → C(*U*)*-Tup is defined by*

$$(\sigma \mathbf{p} A)(t)(a) \stackrel{\text{def}}{=} A(t)(a) \,\triangle\_a \mathbf{P}(t)(a). \tag{92}$$

**Join:** *Let* C<sup>1</sup> = {*a*<sup>1</sup> : L1, ..., *an* : L*n*} *and* C<sup>2</sup> = {*b*<sup>1</sup> : L� 1, ..., *bm* : L� *<sup>m</sup>*} *be annotation schemata. If A*: *U*1*-Tup* → C1(*U*1)*-Tup and B*: *U*2*-Tup* → C2(*U*2)*-Tup, then A B is the* (C<sup>1</sup> ∪ C2)*-relation over U*<sup>1</sup> ∪ *U*<sup>2</sup> *defined as follows.*

$$(A \bowtie B)(t)(a) \overset{\text{def}}{=} \begin{cases} A(t \downarrow \mathcal{U}\_1)(a) & \text{if } a \in \mathcal{U}\_1 - \mathcal{U}\_2 \\ B(t \downarrow \mathcal{U}\_2)(a) & \text{if } a \in \mathcal{U}\_2 - \mathcal{U}\_1 \,. \end{cases} \tag{93}$$

**Difference:** *If A*, *B*: *U-Tup* → C(*U*)*-Tup, then A* \ *B*: *U-Tup* → C(*U*)*-Tup is defined by*

$$(A \backslash B)(t)(a) \stackrel{\text{def}}{=} A(t)(a) -\_a B(t)(a). \tag{94}$$

**Renaming:** *If A*: *U-Tup* → C(*U*)*-Tup and β* : *U* → *U*� *is a bijection, then ρβA*: *U*� *-Tup* → C(*U*� )*-Tup is defined by*

$$(\rho\_{\beta}A)(t)(a) \stackrel{\text{def}}{=} A(t)(\beta(a)). \tag{95}$$

Relational algebra RA<sup>C</sup> still satisfies *all* the main positive relational algebra identities.

**Proposition 5.5** (Identities of C-relations)**.** *The following identities hold for the relational algebra on* C*-relations:*


Bosbach, B. (1965). Komplementare Halbgruppen: Ein Beitrag zur instruktiven Idealtheorie ¨

K-Relations and Beyond 39

Buneman, P.; Khanna, S. & Tan, W. C. (2001). Why and Where: A Characterization of Data

Codd, E. F. (1970). A Relational Model of Data for Large Shared Data Banks. *Communications*

Cui, Y.; Widom, J. & Wiener, J. L. (2000). Tracing the Lineage of View Data in a Warehousing

Davey, B. A. & Priestley, H. A. (1990). *Introduction to Lattices and Order*, Cambridge University

Garcia-Molina, H.; Ullman, J. & Widom, J. (2008). *Database Systems: The Complete Book*, 2nd

Geerts, F. & Poggi, A. (2010). On Database Query Languages for *K*-Relations. *Journal of Applied*

Green, T. J.; Karvounarakis, G. & Tannen, V. (2007). Provenance Semirings, *Proceedings of the*

Hajdinjak, M. & Miheliˇc, F. (2006). The PARADISE Evaluation Framework: Issues and

Hajdinjak, M. & Bauer, A. (2009). Similarity Measures for Relational Databases. *Informatica*,

Hajdinjak, M. & Bierman, G. M. (2011). Extending Relational Algebra with Similarities. To appear in *Mathematical Structures in Computer Science*, ISSN 0960-1295. Hjaltason, G. R. & Samet, H. (2003). Index-Driven Similarity Search in Metric Spaces. *ACM*

Hutton, B. (1975). Normality in Fuzzy Topological Spaces. *Journal of Mathematical Analysis and*

Imielinski, T. & Lipski, W. (1984). Incomplete Information in Relational Databases. *Journal of*

Jae, Y. L. & Elmasri, R. A. (2001). A temporal Algebra for an ER-Based Temporal Data

Ma, Z. (2006). *Studies in Fuzziness and Soft Computing: Fuzzy Database Modeling of Imprecise*

Ma, Z. & Yan, L. (2008). A Literature Overview of Fuzzy Database Models. *Journal of*

*Applications*, Vol. 50, No. 1, April 1975, -74 – -79, ISSN 0022-247X.

*the ACM*, Vol. 31, No. 4, Oct 1984, -761 – -791, ISSN 0004-5411.

*of the ACM*, Vol. 13, No. 6, June 1970, -377 – -387, ISSN 0001-0782.

*Logic*, Vol. 8, No. 2, June 2010, -173 – -185, ISSN 1570-8683.

Beijing, China, June 2007, ACM Press, New York, USA.

Vol. 33, No. 2, May 2009, -135 – -141, ISSN 0350-5596.

Diego, USA, June 2003, ACM Press, New York, USA.



Press, Cambridge, UK.

0891-2017.

Washington, USA.

Germany.

1016-2364.

edition, Prentice Hall, New York, USA.

kommutativer Halbgruppen. *Mathematische Annalen*, Vol. 161, No. 4, Dec 1965, -279 –

Provenance, *Proceedings of the 8th International Conference on Database Theory*, pp. 316-330, ISBN 3-540-41456-8, London, UK, Jan 2001, Springer-Verlag, London, UK. Calì, A.; Lembo, D. & Rosati, R. (2003). On the Decidability and Complexity of Query

Answering over Inconsistent and Incomplete Databases, *Proceedings of the 22nd Symposium on Principles of Database Systems*, pp. 260–271, ISBN 1-58113-670-6, San

Environment. *ACM Transactions on Database Systems*, Vol. 25, No. 2, June 2000, -179 –

*26th Symposium on Principles of Database Systems*, pp. 31-40, ISBN 978-1-59593-685-1,

Findings. *Computational Linguistics*, Vol. 32, No. 2, June 2006, -263 – -272, ISSN

*Transactions on Database Systems*, Vol. 28, No. 4, Dec 2003, -517 – -580, ISSN 0362-5915.

Model, *Proceedings of the the 17th International Conference on Data Engineering*, pp. 33-40, ISBN 0-7695-1001-9, Heidelberg, Germany, April 2001, IEEE Computer Society,

*and Uncertain Engineering Information*, Vol. 195, Springer-Verlag, Berlin Heidelberg,

*Information Science and Engineering*, Vol. 24, No. 1, Jan 2008, -189 – -202, ISSN

*Proof.* If the lattice join and meet are chosen to model the positive relational operations, the above identities are implied by Proposition 4.1. On the other hand, if some other semiring sum and product operations are chosen, the identities are implied by Proposition 2.1.

The properties of relational difference are implied by the identities involving � (see Proposition 5.1) and/or the identities involving ÷ (see Proposition 5.2), depending on the selection we make.

#### **6. Conclusion**

Although the attribute-annotated approach has many advantages, it also has some disadvantages. First, it is clear that asking all attributes to be annotated requires more storage than simple tuple-level annotation. Another problem is that since the proposed general annotation structure, complete distributive lattices with finite meets distributing over arbitrary joins, may not be linearly ordered, an ordering of tuples with falling annotation values is not always possible. Even if each lattice used in an annotation schema is linearly ordered, it is not necessarily the case that there is a linear order on the annotation tuples. Hence, it may not be possible to list query answers (tuples) in a (decreasing) order of relevance. In fact, a suitable ordering of tuples may be established as soon as the lattice of annotation values, L = (*L*, , ∧, **<sup>0</sup>**, **<sup>1</sup>**), is *graded* Stanley (1997). Recall that a graded or ranked poset is a partially ordered set equipped with a rank function *ρ* : *L* → **Z** compatible with the ordering, *ρ*(*x*) < *ρ*(*y*) whenever *x* < *y*, and such that whenever *y* covers *x*, then *ρ*(*y*) = *ρ*(*x*) + 1. Graded posets can be visualized by means of a Hasse diagram. Examples of graded posets are the natural numbers with the usual order, the Cartesian product of two or more sets of natural numbers with the product order being the sum of the coefficients, and the boolean lattice of finite subsets of a set with the number of elements in the subset. Notice, however, that the ranking problem simply reflects a fact about ordered structures and not a flaw in the model.

The work on attribute-annotated models is very new and has, as far as we know, not been implemented yet Hajdinjak & Bierman (2011). A prototype implementation by means of existing relational database management systems is thus expected to be performed in short term. Another guideline for future research is the study of standard issues from relational databases in the general setting, including data dependencies, redundancy, normalization, and design of databases, optimization issues.

#### **7. References**


20 Will-be-set-by-IN-TECH

*Proof.* If the lattice join and meet are chosen to model the positive relational operations, the above identities are implied by Proposition 4.1. On the other hand, if some other semiring

The properties of relational difference are implied by the identities involving � (see Proposition 5.1) and/or the identities involving ÷ (see Proposition 5.2), depending on the

Although the attribute-annotated approach has many advantages, it also has some disadvantages. First, it is clear that asking all attributes to be annotated requires more storage than simple tuple-level annotation. Another problem is that since the proposed general annotation structure, complete distributive lattices with finite meets distributing over arbitrary joins, may not be linearly ordered, an ordering of tuples with falling annotation values is not always possible. Even if each lattice used in an annotation schema is linearly ordered, it is not necessarily the case that there is a linear order on the annotation tuples. Hence, it may not be possible to list query answers (tuples) in a (decreasing) order of relevance. In fact, a suitable ordering of tuples may be established as soon as the lattice

ranked poset is a partially ordered set equipped with a rank function *ρ* : *L* → **Z** compatible with the ordering, *ρ*(*x*) < *ρ*(*y*) whenever *x* < *y*, and such that whenever *y* covers *x*, then *ρ*(*y*) = *ρ*(*x*) + 1. Graded posets can be visualized by means of a Hasse diagram. Examples of graded posets are the natural numbers with the usual order, the Cartesian product of two or more sets of natural numbers with the product order being the sum of the coefficients, and the boolean lattice of finite subsets of a set with the number of elements in the subset. Notice, however, that the ranking problem simply reflects a fact about ordered structures and not a

The work on attribute-annotated models is very new and has, as far as we know, not been implemented yet Hajdinjak & Bierman (2011). A prototype implementation by means of existing relational database management systems is thus expected to be performed in short term. Another guideline for future research is the study of standard issues from relational databases in the general setting, including data dependencies, redundancy, normalization,

Aceto, L.; Ésik, Z. & Ingólfsdóttir, A. (2001). *Equational Theories of Tropical Semirings*, BRICS

Amer, K. (1984). Equationally Complete Classes of Commutative Monoids with Monus. *Algebra Universalis*, Vol. 18, No. 1, Jan 1984, -129 – -131, ISSN 0002-5240. Belohlávek, R. & Vychodil, V. (2006). Relational Model of Data Over Domains with

Bordogna, G. & Psaila, G. (2006). *Flexible Databases Supporting Imprecision and Uncertainty*,

Similarities: An Extension for Similarity Queries and Knowledge Extraction, *Proceedings of the 2006 IEEE International Conference on Information Reuse and Integration*, pp. 207-213, ISBN 0-7803-9788-6, Waikoloa, USA, Sept 2006, IEEE Press,

Report Series RS-01-21, University of Aarhus, Denmark.

Springer-Verlag, Berlin Heidelberg, Germany.

, ∧, **<sup>0</sup>**, **<sup>1</sup>**), is *graded* Stanley (1997). Recall that a graded or

sum and product operations are chosen, the identities are implied by Proposition 2.1.

selection we make.

of annotation values, L = (*L*,

and design of databases, optimization issues.

Piscataway, USA.

flaw in the model.

**7. References**

**6. Conclusion**


**Section 2** 

**Representations** 

