**2. Material and methods**

#### *Conventions*

A datamodel developed in the framework of the DSS Determinator includes the following tables:


A data model consists of *n* features (denoted by *i*, *j*) to describe *m* objects (targets in the ter‐ minology of Determinator, denoted by *p, q, r, s*). Every feature consists of two or more fea‐ ture states (*k*, *l*, and *K* for composite features). The basic principles are defined using first order logic [4, 7].

#### **2.1. Logic basis**

#### *Free access key*

Every cell in the matrix *Target x Features* contains a decision rule. These decision rules de‐ scribe the logical relationship between the feature states and the targets, by specifying valid feature states for each target. A feature state can apply to one or more targets which imply that there might be no unique relation:

$$F\_{i,k} \Rightarrow \{T\_p \quad \forall \quad \cdots \quad \forall \quad T\_s\} \tag{1}$$

In this logic distribution, feature state *Fi,1* identifies exclusively target *p*, and feature state *Fi,3* identifies exclusively target *q*, but feature state *Fi,2* can either identify target *p* or target *q*. This

Reliability and Evaluation of Identification Models Exemplified by a Histological Diagnosis Model

The DSS Determinator allows the user to choose a subject for identification and to answer a range of questions denoting the *n* features available in the model. Every possible answer represents a certain feature state *k*. The match between the chosen subject (represented by the answers given) and a target *p* is calculated by summing up all the true relationships be‐

( *Fi*,*<sup>k</sup>* ⇒*T <sup>p</sup> \*Wi*

as weighting factor for feature *i*. The sums for all targets are represented as *Match*

*Fi*,<sup>Κ</sup> ⇒*P*(*x*), ¬ *Fi*,Κ⇒ ¬*P*(*x*) (5)

A typical dichotomous tree consists of nodes (lemmas), which can point to either two targets (leaves), two nodes (branches) or combination of the two. Basically, every lemma in a tree is

The functions *P(x)* and *¬P(x)* can describe a target or a further node. The structure of a di‐

**Figure 1.** A hypothetical tree with two features indicated by the functions P and Q and three targets.

) (4)

http://dx.doi.org/10.5772/51362

53

dual relationship can be indicated as *overlap* in a Venn diagram.

tween the chosen feature states *Fi,k* and the defined target *p*:

with *Wi*

*Single access key*

based on the decision rule:

chotomous key can be defined as:

∑ *i*=1 *n*

*percentages* in the output of the system and listed in descending order.

with *Fi,k* as feature state *k* of the *i th* feature, and *Tp* ... *Ts* as a series of targets which can be assigned individually. Otherwise, applying more specific feature states can limit the choice of targets:

$$F\_{i,k} \quad \mathsf{A} \quad F\_{j,l} \Rightarrow T\_p \tag{2}$$

with *Fi,k* as feature state *k* of the *i th* feature, *Fj,l* as feature state *l* of the *j th* feature*,* and *Tp* as target.

The use of different states of a feature can add to the separation capability of that feature. Assuming three feature states:

$$\{ (\ F\_{i,1} \lor F\_{i,2} \Rightarrow p) \land (\ F\_{i,2} \lor F\_{i,3} \Rightarrow q) \} \tag{3}$$

In this logic distribution, feature state *Fi,1* identifies exclusively target *p*, and feature state *Fi,3* identifies exclusively target *q*, but feature state *Fi,2* can either identify target *p* or target *q*. This dual relationship can be indicated as *overlap* in a Venn diagram.

The DSS Determinator allows the user to choose a subject for identification and to answer a range of questions denoting the *n* features available in the model. Every possible answer represents a certain feature state *k*. The match between the chosen subject (represented by the answers given) and a target *p* is calculated by summing up all the true relationships be‐ tween the chosen feature states *Fi,k* and the defined target *p*:

$$\sum\_{i=1}^{n} \left( \mathbb{I} F\_{i,k} \Rightarrow T\_p \mathbf{J}^\* \mathcal{W}\_i \right) \tag{4}$$

with *Wi* as weighting factor for feature *i*. The sums for all targets are represented as *Match percentages* in the output of the system and listed in descending order.

*Single access key*

**2. Material and methods**

**•** List of features, with image file names and descriptions,

**•** List of targets, with image file names, descriptions and labels,

**•** Match table, with the feature on the rows and the targets on the columns,

**•** Tree information per node, with descriptions and image file names.

**•** Groups of features, with names and descriptions,

A datamodel developed in the framework of the DSS Determinator includes the following

A data model consists of *n* features (denoted by *i*, *j*) to describe *m* objects (targets in the ter‐ minology of Determinator, denoted by *p, q, r, s*). Every feature consists of two or more fea‐ ture states (*k*, *l*, and *K* for composite features). The basic principles are defined using first

Every cell in the matrix *Target x Features* contains a decision rule. These decision rules de‐ scribe the logical relationship between the feature states and the targets, by specifying valid feature states for each target. A feature state can apply to one or more targets which imply

assigned individually. Otherwise, applying more specific feature states can limit the choice

The use of different states of a feature can add to the separation capability of that feature.

*th* feature, *Fj,l* as feature state *l* of the *j*

{( *F <sup>i</sup>*,1 ⋁ *Fi*,2⇒ *p*) ⋀ ( *F <sup>i</sup>*,2 ⋁ *Fi*,3⇒*q*)} ⇒(*Fi*,2 ⇒ *p* ⋁ *q*) (3)

*Fi*,*<sup>k</sup>* ⇒{*T <sup>p</sup>* ⋁ ⋯ ⋁ *T <sup>s</sup>*} (1)

*th* feature, and *Tp* ... *Ts* as a series of targets which can be

*Fi*,*<sup>k</sup>* ⋀ *F <sup>j</sup>*,*<sup>l</sup>* ⇒*T <sup>p</sup>* (2)

*th* feature*,* and *Tp* as

*Conventions*

52 Decision Support Systems

order logic [4, 7].

**2.1. Logic basis**

*Free access key*

of targets:

target.

that there might be no unique relation:

with *Fi,k* as feature state *k* of the *i*

with *Fi,k* as feature state *k* of the *i*

Assuming three feature states:

tables:

A typical dichotomous tree consists of nodes (lemmas), which can point to either two targets (leaves), two nodes (branches) or combination of the two. Basically, every lemma in a tree is based on the decision rule:

$$F\_{i, \mathbf{K}} \Rightarrow P(\mathbf{x})\_{\prime} \quad \neg F\_{i, \mathbf{K}} \Rightarrow \neg P(\mathbf{x}) \tag{5}$$

The functions *P(x)* and *¬P(x)* can describe a target or a further node. The structure of a di‐ chotomous key can be defined as:

**Figure 1.** A hypothetical tree with two features indicated by the functions P and Q and three targets.

The combined feature state *Fi,K* can combine more than one simple state, e.g. *k* and *l*. Deter‐ minator allows to construct a tree in which a node can point to a node in another part of the tree, and more than one node can point to a defined target *Tp* .

#### **2.2. Quality parameters**

The following parameters for validation of data models are being developed and evaluated in the framework of this paper.

#### *Redundancy*

Overlap between the areas of two targets exists when a variability range for target *p* overlap with the variability range for target *q* for the same feature (see figure 2, targets B and C; equation (3)). The overlap between the area of target *p* and of target *q* is the sum of the over‐ lap regions for all features. Assuming the set of feature states that apply to target *p* as {*Fi,pmin* , *Fi,pmax* } and the set of feature states that apply to target *q* as {*Fi,qmin* , *Fi,qmax* }, then:

*Fi,a* = smallest {*Fi,pmax* , *Fi,qmax* } (upper limit of the overlap region)

*Fi,b* = largest { *Fi,pmin* , *Fi,qmin* } (lower limit of the overlap region)

*mini* = smallest {*Fi,pmin* , *Fi,qmin* } (lower limit of the feature state range of both targets)

*maxi* = largest {*Fi,pmax* , *Fi,qmax* } (upper limit of the feature state range of both targets)

Overlap per feature:

$$r\_{i\_r, p, q} = \frac{F\_{i\_r, p} - F\_{i, q} + 1}{\max\_i - \min\_i + 1}; r\_{i\_r, p, q} \ge 0 \tag{6}$$

**Figure 2.** A hypothetical variation space with five targets A-E and four user chosen subjects. 1: subject outside the total variation space of the data model, a 100% match is impossible; 2: subject inside the total variation space of the data model, but without fit with one of the targets, a 100% match will not occur; 3: subject in the overlap of the varia‐ tion of two or more targets, two or more 100% matches will result; 4: subject in one and only one variation space of a

Reliability and Evaluation of Identification Models Exemplified by a Histological Diagnosis Model

http://dx.doi.org/10.5772/51362

55

The capability to distinguish between two targets *p* and *q* depends on the presence of at least one feature with unique variability ranges for each of the two targets. If overlapping regions exist for all features, there is at least a possibility to have a set of features states, describing a chosen subject, which shows a full match with more than one target. So, two targets *p* and *q* can uniquely be differentiated if and only if a feature *i* exists for which no state identifies

with: *Up,q*= TRUE: the two targets *p* and *q* have at least for one feature *i* non-overlapping fea‐ ture ranges; there is at least one value *ri,p,q* equalling zero (equation (6)), and there is at least

*Up,q*= FALSE: the two targets *p* and *q* have overlapping ranges for all features; there is no val‐ ue *ri,p,q* equalling zero (equation (6)), and there is no feature indicated red in the menu option

If the distinction between two targets is based on only one feature *i* with a value *ri,p,q* equal‐ ing zero (no overlap), then the distinction could be considered as weak. Targets A, C and E in Figure 2 can be distinguished along the *X*-axis, targets A and B, and targets D and E can be distinguished along the *Y*-axis, whereas targets B and C can neither be distinguished

⇔ ∃ *i* :*P*(∀ *k* :*Q*(*Fi*,*<sup>k</sup>* ⇒*T <sup>p</sup>* ⋀ *Fi*,*<sup>k</sup>* ⇏ *Tq*)) (9)

⇔ ∃ *i* :*P*(*ri*, *<sup>p</sup>*,*<sup>q</sup>* =0) (9b)

target, one 100% match will be found.

target *p* as well target *q*:

This can be rewritten as:

Compare of Determinator.

*U <sup>p</sup>*,*<sup>q</sup>*

*U <sup>p</sup>*,*<sup>q</sup>*

one feature indicated red in the menu option Compare of Determinator.

*Uniqueness*

Average overlap for all feature differences between two targets *p* and *q*:

$$\mathcal{R}\_{p,q} = \left\langle \sum\_{i=1}^{n} \frac{F\_{i,p} - F\_{i,q} + 1}{\max\_{i} - \min\_{i} + 1} \right\rangle \Big| \mathbf{n} \tag{7}$$

The average redundancy of the total data model is the averaged overlap of every combina‐ tion between two targets *p* and *q*. There are *( m \* ( m – 1) ) / 2* different combinations of tar‐ gets.

Average redundancy:

$$\mathcal{R}\_{tot} = \left( \sum\_{p=1}^{m} \left( \sum\_{q=1}^{p-1} \mathcal{R}\_{p,q} \right) \right) \Big/ \frac{m^\*(m-1)}{2} \* 100 \tag{8}$$

The smaller the average redundancy, the smaller the chance that a certain range of feature states of a chosen subject will result in two or more match percentages of 100 % (according to (4); see object 3 in Figure 2). Redundancy is related to the correlation coefficients among features.

**Figure 2.** A hypothetical variation space with five targets A-E and four user chosen subjects. 1: subject outside the total variation space of the data model, a 100% match is impossible; 2: subject inside the total variation space of the data model, but without fit with one of the targets, a 100% match will not occur; 3: subject in the overlap of the varia‐ tion of two or more targets, two or more 100% matches will result; 4: subject in one and only one variation space of a target, one 100% match will be found.

#### *Uniqueness*

The combined feature state *Fi,K* can combine more than one simple state, e.g. *k* and *l*. Deter‐ minator allows to construct a tree in which a node can point to a node in another part of the

The following parameters for validation of data models are being developed and evaluated

Overlap between the areas of two targets exists when a variability range for target *p* overlap with the variability range for target *q* for the same feature (see figure 2, targets B and C; equation (3)). The overlap between the area of target *p* and of target *q* is the sum of the over‐ lap regions for all features. Assuming the set of feature states that apply to target *p* as {*Fi,pmin* ,

*Fi,pmax* } and the set of feature states that apply to target *q* as {*Fi,qmin* , *Fi,qmax* }, then:

= smallest {*Fi,pmin* , *Fi,qmin* } (lower limit of the feature state range of both targets)

= largest {*Fi,pmax* , *Fi,qmax* } (upper limit of the feature state range of both targets)

*<sup>n</sup> Fi*, *<sup>p</sup>* - *Fi*,*<sup>q</sup>* + 1

The average redundancy of the total data model is the averaged overlap of every combina‐ tion between two targets *p* and *q*. There are *( m \* ( m – 1) ) / 2* different combinations of tar‐

*Rp*,*<sup>q</sup>*))/ *<sup>m</sup> \**(*<sup>m</sup>* - 1)

The smaller the average redundancy, the smaller the chance that a certain range of feature states of a chosen subject will result in two or more match percentages of 100 % (according to (4); see object 3 in Figure 2). Redundancy is related to the correlation coefficients among

*maxi* <sup>−</sup> *mini* <sup>+</sup> <sup>1</sup> ;*ri*, *<sup>p</sup>*,*<sup>q</sup>* <sup>≥</sup><sup>0</sup> (6)

*maxi* - *mini* <sup>+</sup> <sup>1</sup> )/ *n* (7)

<sup>2</sup> *\**100 (8)

*ri*, *<sup>p</sup>*,*<sup>q</sup>* <sup>=</sup> *Fi*, *<sup>p</sup>* <sup>−</sup> *Fi*,*<sup>q</sup>* <sup>+</sup> <sup>1</sup>

Average overlap for all feature differences between two targets *p* and *q*:

*Rp*,*<sup>q</sup>* = (∑ *i*=1

*Rtot* = ( ∑ *p*=1 *m* (∑ *q*=1 *p*-1

tree, and more than one node can point to a defined target *Tp* .

*Fi,a* = smallest {*Fi,pmax* , *Fi,qmax* } (upper limit of the overlap region)

*Fi,b* = largest { *Fi,pmin* , *Fi,qmin* } (lower limit of the overlap region)

**2.2. Quality parameters**

54 Decision Support Systems

*Redundancy*

*mini*

*maxi*

gets.

features.

Overlap per feature:

Average redundancy:

in the framework of this paper.

The capability to distinguish between two targets *p* and *q* depends on the presence of at least one feature with unique variability ranges for each of the two targets. If overlapping regions exist for all features, there is at least a possibility to have a set of features states, describing a chosen subject, which shows a full match with more than one target. So, two targets *p* and *q* can uniquely be differentiated if and only if a feature *i* exists for which no state identifies target *p* as well target *q*:

$$\mathcal{U}\_{p,q} \iff \exists \ i : \mathcal{P} \{ \forall \ k : \mathcal{Q} \{ F\_{i,k} \Rightarrow T\_p \land F\_{i,k} \Leftrightarrow T\_q \} \} \tag{9}$$

This can be rewritten as:

$$\text{If } U\_{p,q} \iff \exists \text{ i} : P\{r\_{i,p,q} = 0\} \tag{9b}$$

with: *Up,q*= TRUE: the two targets *p* and *q* have at least for one feature *i* non-overlapping fea‐ ture ranges; there is at least one value *ri,p,q* equalling zero (equation (6)), and there is at least one feature indicated red in the menu option Compare of Determinator.

*Up,q*= FALSE: the two targets *p* and *q* have overlapping ranges for all features; there is no val‐ ue *ri,p,q* equalling zero (equation (6)), and there is no feature indicated red in the menu option Compare of Determinator.

If the distinction between two targets is based on only one feature *i* with a value *ri,p,q* equal‐ ing zero (no overlap), then the distinction could be considered as weak. Targets A, C and E in Figure 2 can be distinguished along the *X*-axis, targets A and B, and targets D and E can be distinguished along the *Y*-axis, whereas targets B and C can neither be distinguished along the *X*-axis nor the *Y*-axis. If the group with the distinctive feature is disabled, Determi‐ nator could give for more than one target a full match in a query.

#### *Separation capability*

A data model can identify uniquely every target if and only if every combination between two targets *p* and *q* can be described with *Up,q* = TRUE:

$$D\_{tot} = \left(\sum\_{p=1}^{m} \left(\sum\_{q=1}^{p-1} \mathcal{U}\_{p,q} = TR\,\text{LLE}\right)\right) \Big|\,\frac{m\,\,\,\,\epsilon(m-1)}{2}\,\ast 100\,\tag{10}$$

**3. Model development and application**

trated in Figure 3.

The use of illegal growth promoters is, although prohibited in the European Union, still part of current practice in animal farming. Reasonable monitoring of the hormones is hampered by the fact that the hormone or hormone cocktail is metabolised or excreted within a period of a few weeks. The effects of the use of hormones, however, can be seen in histological stained sections of either the prostate (male calves) or gland of Bartholin (female calves) with different staining techniques. The monitoring by means of histological examinations appears to be an important instrument in maintaining legislation for food safety and animal health [5, 6]. The interpretation of histological disorders needs a high level of expertise. An expert model has been developed in the framework of the DSS Determinator, in order to support the user to identify the extent of hormone treatment of veal calves. The different

Reliability and Evaluation of Identification Models Exemplified by a Histological Diagnosis Model

http://dx.doi.org/10.5772/51362

57

The data model consists of 13 features to identify a treatment level indicated as "normal", "suspect" or "positive". The features are presented in Table 1, and some of them are illus‐

II 2 Ratio between ducts and glandular tissue [normal,more\_ducts,mainly\_ducts]

[no,yes]

quality parameters will be illustrated after a further presentation of the model.

**Group Number Feature States**

II 4 Combined presence of metaplasia and an elevated

duct ratio

I 1 Presence of metaplasia (male) [none,mild,severe]

II 3 Presence of metaplasia (female) [none,mild,severe]

III 5 Presence of hyperplasia [none,mild,severe] III 6 Presence of cysts [none,mild,severe] III 7 Presence of hypersecretion [none,mild,severe] III 8 Presence of vacuolisation [none,mild,severe]

III 9 Presence of muceus cells [none,moderate,severe] III 10 Presence of inflammation [none,moderate,severe] III 11 Presence of folding in the urethra [none,moderate,severe] III 12 Presence of thickening in the urethra [none,moderate,severe]

**Table 1.** Overview of histological features for identifying the level of hormone treatment. The number of deviating features (group IV) is the sum of features in group III and either group I (male) or group II (female) that have a state

IV 13 Number of deviating features [0,....,9]

differing from "none". Feature 2 is excluded from this sum since it only applies to female animals.

A datamodel can be indicated as suboptimal or not valid when the differentiation coefficient *Dtot* is less than 100%. Whether or not a data model could be validated with a value for *Dtot* lower than 100 % depends on the intention to have non-distinguishable targets (synonyms) present in the model or not.

#### *Coverage of variability space*

Every target possesses a part of the *n*-dimensional space defined by the data model. The share of a target in the total space is calculated as:

Coverage of space of a single target *p*:

$$
\rho\_p = \left(\prod\_{i=1}^n (s\_{i,p} / t\_i)\right) \* 100\tag{11}
$$

with:

$$s\_{i,p} = F\_{i,pmax} - F\_{i,pmax} + 1$$

$$t\_i = F\_{i,max} - F\_{i,min} + 1$$

In the situation of *Dtot* equalling 100 %, the sum of all individual target coverages is an indi‐ cation of the total coverage of the variability space:

$$\mathbf{O}\_{tot} = \sum\_{p=1}^{m} o\_p \; ; \; D\_{tot} = 100\% \tag{12}$$

The larger the coverage of the total variability space, the smaller the chance that a certain range of values of a subject will result in no match percentage of 100 % (according to (4)). In the situation that *Dtot* is smaller than 100% an overestimation occurs.

The diagnosis for illegal growth hormone use in veal calves will be used as illustration of model development and performance testing.
