**3. Model development and application**

along the *X*-axis nor the *Y*-axis. If the group with the distinctive feature is disabled, Determi‐

A data model can identify uniquely every target if and only if every combination between

*<sup>U</sup> <sup>p</sup>*,*<sup>q</sup>* <sup>=</sup>*TRUE*))/ *<sup>m</sup> \**(*<sup>m</sup>* - 1)

A datamodel can be indicated as suboptimal or not valid when the differentiation coefficient *Dtot* is less than 100%. Whether or not a data model could be validated with a value for *Dtot* lower than 100 % depends on the intention to have non-distinguishable targets (synonyms)

Every target possesses a part of the *n*-dimensional space defined by the data model. The

(*si*, *<sup>p</sup>* / *ti*

In the situation of *Dtot* equalling 100 %, the sum of all individual target coverages is an indi‐

The larger the coverage of the total variability space, the smaller the chance that a certain range of values of a subject will result in no match percentage of 100 % (according to (4)). In

The diagnosis for illegal growth hormone use in veal calves will be used as illustration of

*op* = (∏ *i*=1 *n*

*Otot* = ∑ *p*=1 *m*

the situation that *Dtot* is smaller than 100% an overestimation occurs.

<sup>2</sup> *\**100 (10)

))*\**100 (11)

*o <sup>p</sup>* ; *Dtot* =100*%* (12)

nator could give for more than one target a full match in a query.

two targets *p* and *q* can be described with *Up,q* = TRUE:

*Dtot* = ( ∑ *p*=1 *m* (∑ *q*=1 *p*-1

share of a target in the total space is calculated as:

cation of the total coverage of the variability space:

model development and performance testing.

Coverage of space of a single target *p*:

*Separation capability*

56 Decision Support Systems

present in the model or not.

*Coverage of variability space*

with:

*ti*

*si,p = Fi,pmax - Fi,pmax +* 1

 *= Fi,max - Fi,min +* 1

The use of illegal growth promoters is, although prohibited in the European Union, still part of current practice in animal farming. Reasonable monitoring of the hormones is hampered by the fact that the hormone or hormone cocktail is metabolised or excreted within a period of a few weeks. The effects of the use of hormones, however, can be seen in histological stained sections of either the prostate (male calves) or gland of Bartholin (female calves) with different staining techniques. The monitoring by means of histological examinations appears to be an important instrument in maintaining legislation for food safety and animal health [5, 6]. The interpretation of histological disorders needs a high level of expertise. An expert model has been developed in the framework of the DSS Determinator, in order to support the user to identify the extent of hormone treatment of veal calves. The different quality parameters will be illustrated after a further presentation of the model.

The data model consists of 13 features to identify a treatment level indicated as "normal", "suspect" or "positive". The features are presented in Table 1, and some of them are illus‐ trated in Figure 3.


**Table 1.** Overview of histological features for identifying the level of hormone treatment. The number of deviating features (group IV) is the sum of features in group III and either group I (male) or group II (female) that have a state differing from "none". Feature 2 is excluded from this sum since it only applies to female animals.

The diagnoses as illustrated in Table 2 can be extended further by including the individu‐ al features of group III (Table 1). The number of deviating features (feature 13) needs to be adjusted accordingly. The basic rules are translated in a formal decision tree, as shown in

Reliability and Evaluation of Identification Models Exemplified by a Histological Diagnosis Model

http://dx.doi.org/10.5772/51362

59

**Figure 4.** Decision tree for the diagnosis of hormone treatment in veal calves.

tor. All features of group III got the factor one.

Finally, the decision tree is used as basis for a free access key. The importance and position of the feature indicating the presence of metaplasia is different for male and female diagno‐ sis. For the latter only the combination of metaplasia and elevated duct ratio is decisive for the diagnosis "positive". As a consequence, the presence of metaplasia is included twice in the free access key as feature 1 (group I for male animals) and feature 3 (group II for female animals). The free access key was optimised by giving all features a suitable weighting fac‐

Figure 4.

**Figure 3.** Normal appearance of prostate (left) and intensive presence of hypersecretion and cysts (right).

There are two strategies to reach a diagnosis:


The kernel of the data model consists of the groups I, II and IV to give a diagnosis of the treatment level. The diagnosis for possible hormone treatment in female calves is more com‐ plicated than for male calves. This is caused by the natural production of oestrogen hor‐ mones, which is lacking in male calves. The simple diagnosis <IF metaplasia=present THEN target positive> needs further support in female calves. A second diagnostic feature is used based on a larger share of ducts in the glandular tissue. The basic rule is then expanded to <IF metaplasia=present AND duct\_ratio=elevated THEN target positive>. For both male and female calves the diagnosis "suspect" is supported by the number of deviating features. The duct ratio is excluded from this feature since it applies only to female calves. The logic tables to diagnose the level of treatment are presented in Table 2.


**Table 2.** Logic tables for the diagnosis of hormone treatment in veal calves. #: total number of deviating features including the presence of metaplasia, excluding an elevated duct ratio.

The diagnoses as illustrated in Table 2 can be extended further by including the individu‐ al features of group III (Table 1). The number of deviating features (feature 13) needs to be adjusted accordingly. The basic rules are translated in a formal decision tree, as shown in Figure 4.

**Figure 4.** Decision tree for the diagnosis of hormone treatment in veal calves.

**Figure 3.** Normal appearance of prostate (left) and intensive presence of hypersecretion and cysts (right).

groups I and IV (male) or groups II and IV (female) is sufficient.

to diagnose the level of treatment are presented in Table 2.

Duct ratio [mild,severe] [none]

#=[3,...,9] → "suspect"

[more,mainly] "positive" #=[0,1,2] → "normal"

[mild,severe] [none]

**Table 2.** Logic tables for the diagnosis of hormone treatment in veal calves. #: total number of deviating features

"positive" #=[0,1,2] → "normal"

#=[3,...,8] → "suspect"

#=[3,...,8] → "suspect"

**female** Metaplasia

**male** Metaplasia

[normal] #=[1,2] → "normal"

including the presence of metaplasia, excluding an elevated duct ratio.

**A.** A quick, general diagnosis. Depending on the sex of the calf, selecting either feature

**B.** An extended diagnosis. In addition to the feature groups as indicated in strategy A

The kernel of the data model consists of the groups I, II and IV to give a diagnosis of the treatment level. The diagnosis for possible hormone treatment in female calves is more com‐ plicated than for male calves. This is caused by the natural production of oestrogen hor‐ mones, which is lacking in male calves. The simple diagnosis <IF metaplasia=present THEN target positive> needs further support in female calves. A second diagnostic feature is used based on a larger share of ducts in the glandular tissue. The basic rule is then expanded to <IF metaplasia=present AND duct\_ratio=elevated THEN target positive>. For both male and female calves the diagnosis "suspect" is supported by the number of deviating features. The duct ratio is excluded from this feature since it applies only to female calves. The logic tables

There are two strategies to reach a diagnosis:

group III is necessary.

58 Decision Support Systems

Finally, the decision tree is used as basis for a free access key. The importance and position of the feature indicating the presence of metaplasia is different for male and female diagno‐ sis. For the latter only the combination of metaplasia and elevated duct ratio is decisive for the diagnosis "positive". As a consequence, the presence of metaplasia is included twice in the free access key as feature 1 (group I for male animals) and feature 3 (group II for female animals). The free access key was optimised by giving all features a suitable weighting fac‐ tor. All features of group III got the factor one.

The performance of the model is tested in eight runs following the two strategies. The con‐ tinuous feature 13 is varied between 0 and 9 in every run in combination with the appropri‐ ate choices for the other features, as follows:

features of group III (Figures 5b, 5d, 6d) modifies the outcome of the model in the sense that in a lot of cases not 100% score can be reached. This reflects the situation that the finding of metaplasia (male) or the combination of metaplasia and an elevated duct ratio (female) ac‐ companied with only a few or even no other deviations is unlikely or highly unlikely.

Reliability and Evaluation of Identification Models Exemplified by a Histological Diagnosis Model

http://dx.doi.org/10.5772/51362

61

**Figure 5.** Performance of the free access key (matrix) for the prostate. Four different runs are illustrated. The choices for the main features are indicated in the tables on top of the figures. The choices for feature 13 (number of deviating features) running from zero to nine are given on the x-axis. The match percentage (according to equation 4) is given

The large coverage of the targets indicated as "positive" (Table 3) is caused by the situation that the model is focusing on the correct diagnosis of possible treatment minimising the pos‐ sibility of having false negative results. In both cases for male and female calves the final diagnosis is based on one feature (see Table 2 and Figure 4), whereas the states of the other

**Figure 6.** Performance of the free access key (matrix) for the gland of Bartholin. Four different runs are illustrated. The choices for the main features are indicated in the tables on top of the figures. The choices for feature 13 (number of deviating features) running from zero to nine are given on the x-axis. The match percentage (according to equation 4)

is given on the y-axis. The main differentiating feature per animal type is printed bold.

on the y-axis. The main differentiating feature per animal type is printed bold.

features are overruled.

A1 (male): groups I and IV are used. Choice for feature 1 is [none].

A2 (male): groups I and IV are used. Choice for feature 1 is [mild] unless:

F 13,k =0 → F 1,k =[none].

A3 (female): groups II and IV are used. Choices for features 2 and 3 are [normal] and [none].

A4 (female): groups II and IV are used. Choices for features 2 and 3 are [more\_ducts] and [none].

A5 (female): groups II and IV are used. Choices for features 2 and 3 are [more\_ducts] and [mild] unless:

F 13,k =0 → F 3,k =[none].

B1 (male): groups I, III and IV are used. Choice for features 1 is [none].

B2 (male): groups I, III and IV are used. Choice for features 1 is [mild] unless:

F 13,k =0 → F 1,k =[none].

 F 13,k =>1 → the appropriate number of features of group B and C get the state [mild] or [moderate].

B3 (female): groups II, III and IV are used. Choice for feature 3 is [mild] unless:

F 13,k =0 → F 3,k =[none].

 F 13,k =>1 → the appropriate number of features of group B and C get the state [mild] or [moderate].

The choice for [severe] instead of [mild] will give identical results except for the presence of hyperplasia (feature 5).

In every run the matches between the simulated subject and all three targets (treatment classes) "normal", "suspect" or "positive" were calculated according to equation (4). The re‐ sults for the eight runs are shown in Figures 5 and 6.

The model after adjusting the appropriate weighting factors shows the highest match per‐ centage for the same target (class) as indicated by the tree (Figure 4) in all cases. The per‐ centage for a diagnosis "positive" of a male animal (Figure 5) is 0% when no deviating feature is found, in contrast to a diagnosis of a female animal (Figure 6) where an elevated duct ratio can be found in combination with # deviating features = 0. For the same reason is the difference between the diagnoses "normal" and "positive" smaller for male animals (Figure 5d) than for female animals (Figure 6d) in the case that # deviating features = 1. In general, the comparable situations as illustrated in Figures 5a and 6a/b, in Figures 5c and 6c, and in Figures 5d and 6d respectively, shows highly comparable results. The addition of the features of group III (Figures 5b, 5d, 6d) modifies the outcome of the model in the sense that in a lot of cases not 100% score can be reached. This reflects the situation that the finding of metaplasia (male) or the combination of metaplasia and an elevated duct ratio (female) ac‐ companied with only a few or even no other deviations is unlikely or highly unlikely.

The performance of the model is tested in eight runs following the two strategies. The con‐ tinuous feature 13 is varied between 0 and 9 in every run in combination with the appropri‐

A3 (female): groups II and IV are used. Choices for features 2 and 3 are [normal] and [none].

A4 (female): groups II and IV are used. Choices for features 2 and 3 are [more\_ducts] and

A5 (female): groups II and IV are used. Choices for features 2 and 3 are [more\_ducts] and

F 13,k =>1 → the appropriate number of features of group B and C get the state [mild] or

F 13,k =>1 → the appropriate number of features of group B and C get the state [mild] or

The choice for [severe] instead of [mild] will give identical results except for the presence of

In every run the matches between the simulated subject and all three targets (treatment classes) "normal", "suspect" or "positive" were calculated according to equation (4). The re‐

The model after adjusting the appropriate weighting factors shows the highest match per‐ centage for the same target (class) as indicated by the tree (Figure 4) in all cases. The per‐ centage for a diagnosis "positive" of a male animal (Figure 5) is 0% when no deviating feature is found, in contrast to a diagnosis of a female animal (Figure 6) where an elevated duct ratio can be found in combination with # deviating features = 0. For the same reason is the difference between the diagnoses "normal" and "positive" smaller for male animals (Figure 5d) than for female animals (Figure 6d) in the case that # deviating features = 1. In general, the comparable situations as illustrated in Figures 5a and 6a/b, in Figures 5c and 6c, and in Figures 5d and 6d respectively, shows highly comparable results. The addition of the

ate choices for the other features, as follows:

F 13,k =0 → F 1,k =[none].

60 Decision Support Systems

F 13,k =0 → F 3,k =[none].

F 13,k =0 → F 1,k =[none].

F 13,k =0 → F 3,k =[none].

hyperplasia (feature 5).

sults for the eight runs are shown in Figures 5 and 6.

[none].

[mild] unless:

[moderate].

[moderate].

A1 (male): groups I and IV are used. Choice for feature 1 is [none].

A2 (male): groups I and IV are used. Choice for feature 1 is [mild] unless:

B1 (male): groups I, III and IV are used. Choice for features 1 is [none].

B2 (male): groups I, III and IV are used. Choice for features 1 is [mild] unless:

B3 (female): groups II, III and IV are used. Choice for feature 3 is [mild] unless:


**Figure 5.** Performance of the free access key (matrix) for the prostate. Four different runs are illustrated. The choices for the main features are indicated in the tables on top of the figures. The choices for feature 13 (number of deviating features) running from zero to nine are given on the x-axis. The match percentage (according to equation 4) is given on the y-axis. The main differentiating feature per animal type is printed bold.

The large coverage of the targets indicated as "positive" (Table 3) is caused by the situation that the model is focusing on the correct diagnosis of possible treatment minimising the pos‐ sibility of having false negative results. In both cases for male and female calves the final diagnosis is based on one feature (see Table 2 and Figure 4), whereas the states of the other features are overruled.


**Figure 6.** Performance of the free access key (matrix) for the gland of Bartholin. Four different runs are illustrated. The choices for the main features are indicated in the tables on top of the figures. The choices for feature 13 (number of deviating features) running from zero to nine are given on the x-axis. The match percentage (according to equation 4) is given on the y-axis. The main differentiating feature per animal type is printed bold.


**4. Discussion**

input sensitive [11].

The process of identifying the level of treatment with growth hormones of veal calves is a rather specific situation for diagnosing in the broader framework of application of DSS in medicine [8-10]. Only one feature matters, all other features will only modify the probability that a diagnosis belongs to the correct class. Besides that, a constraint dependency rule exist‐ sbetween feature 13 (number of deviating features; Table 1) and the totalof features from group III plus either from group I or group II which show a state other than normal. The importance of the main features is visible in Table 2 and Figure 4. The two main features (male: presence of metaplasia, female: combined presence of metaplasia and an elevated duct ratio) both got a weighting factor of 9 in order to outnumber the features in group III for reaching a correct diagnosis (number of features in group III plus 1). Since the presence of metaplasia in the diagnosis of a female calf does not form the exclusive indicator for treat‐ ment in contrast to the position of that feature in the diagnosis of the male calf, it got a weighting factor of only 1. The weight factors in the current model are fixed instead of being

Reliability and Evaluation of Identification Models Exemplified by a Histological Diagnosis Model

http://dx.doi.org/10.5772/51362

63

There is no generic method for validation of data models in expert systems [7]. In the cur‐ rent study a top down modelling approach was chosen: logic tables lead to a decision tree, which was the basis for the full matrix of the free access key. This approach does not pro‐ vide a tool for handling constraint dependency rules [7], which was solved here by optimis‐ ing the weighting factors. Rass et al. [12] listed a number of requirements for valid expert systems. Of these, the requirements for minimising the redundancy and for avoiding unin‐ tended synonyms are now supported bymeasures to calculate the extent of these parame‐ ters: redundancy (equation (8)) and separation capability (equation (10)), respectively.

The position of the features of group III (Table 1: indicating the individual deviating charac‐ teristics) in an extended diagnosis (Figures 4b, 4d, 5d) can be discussed in terms of fuzzy logic principles. In several experiments with fuzzy logic comparable results have been found [9, 13]. Here, probability or uncertainty is the basic aspect causing patterns in the model outcomes that can be explained as membership functions [13]. As an example, the presence of metaplasia in a prostate is a definite diagnosis for treatment with growth hor‐ mones (n = 1 in Figure 5c in concordance with the tree in Figure 4), but it is highly unlikely that with such a diagnosis none of the other features of group III (Table 1) would show a state deviating from normal. The probability that an animal with the sole presence of meta‐ plasia belongs to membership class "positive" is only slightly higher than its membership to the class "suspect" (n=1 in Figure 5d). The kernel model without using the individual fea‐ tures of group III (strategy A) seems sufficient to reach a diagnosis. All the features underly‐ ing the depending feature 13 (group IV) are nevertheless included in the model in order to improve the performance of the user by supporting his or her examinations, and to provide

Existing results of optimising a datamodel for reaching a diagnosis reveal that lower num‐ bers of features appeared to be optimal [10]. In those cases that a model consists of only a few features, expressing them in terms of space dimensions (e.g. a two-dimensional space in

the possibility of an iterative process of optimising the diagnosis [14].

**Table 3.** Coverage of the variability space by the individual targets and the total dataset.

The correlation between the features is shown in Table 4. Only a full correlation is found be‐ tween the two features indicating the presence of metaplasia. This feature is included twice since different weighting factors appeared to be needed for the different animal types. An‐ other reasonable high correlation factor was found between the duct ratio and the combined presence of metaplasia and elevated duct ratio. The presented level of correlation coeffi‐ cients is in line with the calculated average redundancy: 0.405 (equation (8)).


**Table 4.** Matrix with Pearson's correlations between the features of the kernel model for diagnosis of illegal hormone use in veal calves. The colour of every cell (running from red to green) represents the value of the correlation coeffi‐ cient.

The match table (Table 5) shows the relative resemblance between the targets based on equation (7). Except for the diagonal, the green colour, based on the calculations using equa‐ tion (9), indicates that every target can be diagnosed uniquely compared to any other target. Hence, the separation capability is 100% (equation (10)).


**Table 5.** Matrix with the matches between the targets of the model for diagnosis of illegal hormone use in veal calves. The figure in every cell is calculated according to equation (7), the colour of every cell is based on equation (9).
