4. Probability elicitation methods

The process of probability elicitation can be supported by a variety of techniques designed to aid experts when they find it hard to express their degrees of belief with numbers. These methods are based on setting-controlled situations in which probability assessments can be inferred from the expert's behaviors [17]. In this section, we describe the use of probability scales with visual aids to make probability assessment easier for experts. However, it is worth noting, visual aids like probability scales (i.e., which uses numbers) still tend to be biased.

It is our understanding that the use of visual elements such as probability scales can improve the input quality of semiautomatic methods (i.e., the ones which needs probability distributions as input), but indirect methods, which we do not discuss here, may improve the input quality as well. Several methods for indirect elicitation of probabilities have been developed. Some well know methods are: the odds method; the bid method; the lottery method; the probability-wheel method; among others [17, 18], these methods allow the extraction of probabilities without have to explicitly mention probabilities, so to speak.

Both direct and indirect methods can be incorporated at some degree into semiautomatic methods. The purpose of this section is to show one of these techniques which can extend semiautomatic methods, as an example. Also, different techniques may produce different results, so we encourage readers to check a comprehensive review of issues related to the probability elicitation task which has a section dedicated for direct and indirect methods [17].

#### 4.1 Probability scale

A probability scale is composed of a line that can be arranged vertically or horizontally with discrete numerical anchors which denotes the probabilities. It is a direct probability assessment method. To assess a probability, the experts mark a position on the scale. The probability value is given by the marking distance to the zero point of the scale. An example of a numerical probability scale can be seen in Figure 3.

The advantage of using a scale is that it allows for the domain experts to think in terms of visual proportion rather than in terms of precise numbers. However, it is important to consider bias that may be introduced using probability scales. For example, let us say an expert is requested to indicate several assessments on a single line. In such a case, he is likely to introduce bias towards esthetically distributed marks. This bias is known as the spacing effect [17] and can be easily avoided by using a separate scale for each probability. Another bias that may be introduced by the use of probability scales is the tendency of people to use the middle of the scale.

Issues in the Probability Elicitation Process of Expert-Based Bayesian Networks

DOI: http://dx.doi.org/10.5772/intechopen.81602

Furthermore, scales can be used in combination with other components that may help in the task of probability assessment. In [20], a method is presented for elicitation of a large number of conditional probabilities in short time. This method was used to build a real-world BN for the diagnosis of esophageal cancer with more than 4000 conditional probabilities. This BN predicted the correct cancer stage for 85% of the patients [21]. The main idea of this method is to present to the expert a figure with a double scale and a text fragment for each conditional probability. An example of combining probability scales with other components can be seen in

This bias is known as the centering effect [17].

Figure 5.

13

Figure 4.

Probability scale with numbers and words.

There is no standard scale. For instance, anchors may vary in distance and values according to the domain, and lines can be arranged in different positions. Moreover, during probability assessment, one can use both numerical and verbal anchors. In [19] it is proposed a double scale that combines numbers and textual descriptions of probability to aid in the communication of probabilities. According to [19], verbal descriptions commonly used by people to express probabilities are directly related to the numerical values of the probabilities itself. In Figure 4, we can see an example of a double scale arranged in the vertical position with numerical and verbal anchors.


Figure 3. Probability scale with numerical anchors. Issues in the Probability Elicitation Process of Expert-Based Bayesian Networks DOI: http://dx.doi.org/10.5772/intechopen.81602

Figure 4.

removes the need of direct evaluation of probabilities during the construction of the NPT, which makes it easier for the facilitator and the expert to deal with these heuristics during the elicitation process, seizing the benefits of the heuristics and

The process of probability elicitation can be supported by a variety of techniques designed to aid experts when they find it hard to express their degrees of belief with numbers. These methods are based on setting-controlled situations in which probability assessments can be inferred from the expert's behaviors [17]. In this section, we describe the use of probability scales with visual aids to make probability assessment easier for experts. However, it is worth noting, visual aids like proba-

It is our understanding that the use of visual elements such as probability scales can improve the input quality of semiautomatic methods (i.e., the ones which needs probability distributions as input), but indirect methods, which we do not discuss here, may improve the input quality as well. Several methods for indirect elicitation of probabilities have been developed. Some well know methods are: the odds method; the bid method; the lottery method; the probability-wheel method; among others [17, 18], these methods allow the extraction of probabilities without have to

Both direct and indirect methods can be incorporated at some degree into semiautomatic methods. The purpose of this section is to show one of these techniques which can extend semiautomatic methods, as an example. Also, different techniques may produce different results, so we encourage readers to check a comprehensive review of issues related to the probability elicitation task which has a

A probability scale is composed of a line that can be arranged vertically or horizontally with discrete numerical anchors which denotes the probabilities. It is a direct probability assessment method. To assess a probability, the experts mark a position on the scale. The probability value is given by the marking distance to the zero point of the scale. An example of a numerical probability scale can be seen in

There is no standard scale. For instance, anchors may vary in distance and values according to the domain, and lines can be arranged in different positions. Moreover, during probability assessment, one can use both numerical and verbal anchors. In [19] it is proposed a double scale that combines numbers and textual descriptions of probability to aid in the communication of probabilities. According to [19], verbal descriptions commonly used by people to express probabilities are directly related to the numerical values of the probabilities itself. In Figure 4, we can see an example of a double scale arranged in the vertical position with numerical and

reducing their possible negative effects.

Enhanced Expert Systems

4. Probability elicitation methods

explicitly mention probabilities, so to speak.

4.1 Probability scale

Figure 3.

verbal anchors.

Probability scale with numerical anchors.

Figure 3.

12

section dedicated for direct and indirect methods [17].

bility scales (i.e., which uses numbers) still tend to be biased.

Probability scale with numbers and words.

The advantage of using a scale is that it allows for the domain experts to think in terms of visual proportion rather than in terms of precise numbers. However, it is important to consider bias that may be introduced using probability scales. For example, let us say an expert is requested to indicate several assessments on a single line. In such a case, he is likely to introduce bias towards esthetically distributed marks. This bias is known as the spacing effect [17] and can be easily avoided by using a separate scale for each probability. Another bias that may be introduced by the use of probability scales is the tendency of people to use the middle of the scale. This bias is known as the centering effect [17].

Furthermore, scales can be used in combination with other components that may help in the task of probability assessment. In [20], a method is presented for elicitation of a large number of conditional probabilities in short time. This method was used to build a real-world BN for the diagnosis of esophageal cancer with more than 4000 conditional probabilities. This BN predicted the correct cancer stage for 85% of the patients [21]. The main idea of this method is to present to the expert a figure with a double scale and a text fragment for each conditional probability. An example of combining probability scales with other components can be seen in Figure 5.

continuous scale ordered monotonically in the interval [0, 1]. For example, for the ordinal scale ["Low", "Medium", "High"], "Low" is represented by the interval [0, 1/3], "Medium", by the interval [1/3, 2/3], and "High", by the interval [2/3, 1]. This concept is based on the doubly truncated Normal (TNormal) distribution. A normal distribution is made of four parameters: μ, mean (i.e., central tendency); σ2, variance (i.e., uncertainty about the central tendency); a, the lower bound (i.e., 0); and, b, upper bound (i.e., 1). With a normal distribution, it is possible to model a variety of curves (i.e., relationships) as a uniform distribution, achieved when <sup>σ</sup><sup>2</sup> <sup>¼</sup> <sup>∞</sup>, and highly skewed distributions, achieved when <sup>σ</sup><sup>2</sup> <sup>¼</sup> 0. In

Issues in the Probability Elicitation Process of Expert-Based Bayesian Networks

DOI: http://dx.doi.org/10.5772/intechopen.81602

Figure 6, we show an example of TNormal with same μ but different σ2.

WMEAN z<sup>ð</sup> <sup>1</sup>; <sup>k</sup>; …; zn; <sup>k</sup>; <sup>w</sup>1; …; wnÞ ¼ <sup>∑</sup><sup>n</sup>

WMIN zð <sup>1</sup>; k; …; zn; k; w1; …; wnÞ ¼

Figure 6.

Examples of TNormal.

WMAX zð <sup>1</sup>; k; …; zn; k; w1; …; wnÞ ¼

MIXMINMAX zð <sup>1</sup>; k; …; zn; k; wmin; wmaxÞ ¼

brings more modeling flexibility.

15

In this method, μ is defined by a weighted function of the parent nodes. There are four function types: weighted mean (WMEAN) Eq. (2), weighted minimum (WMIN) Eq. (3), weighted maximum (WMAX) Eq. (4) and a mix of both MIN and MAX functions (MIXMINMAX) Eq. (5). In practice, these functions define the central tendency of the child node for the combination of parent nodes states. The weight of each parent node, which quantifies the relative strengths of the influences of the parents on the child node, must be defined by a constant w in which w ∈ ℕ.

min

max

i ¼ 1, …, n

i ¼ 1, …, n

Fenton et al. [1] do not present the details to, in practice, implement the solution. Despite presenting the mixture functions, there is no information regarding the algorithms used to generate and mix TNormal, define samples size and define a conventional NPT given the calculated TNormals. The latter enables the integration of ranked nodes with other types of nodes such as Boolean and continuous, which

In [22], it is proposed a probabilistic algorithm for this purpose, comp-

osed of two main steps: (i) generate samples for the parent nodes and

<sup>i</sup>¼<sup>1</sup>wi <sup>∗</sup> zi, k ∑<sup>n</sup> <sup>i</sup>¼<sup>1</sup>wi

wi þ n � 1 ( )

wi þ n � 1 ( )

WMIN <sup>∗</sup> min <sup>i</sup>¼1,…,n zi f g ; <sup>k</sup> <sup>þ</sup> WMAX <sup>∗</sup> max <sup>i</sup>¼1,…,n zi f g ; <sup>k</sup> WMIN þ WMAX

<sup>j</sup>6¼<sup>1</sup>zj, k

<sup>j</sup>6¼<sup>1</sup>zj, k

wi <sup>∗</sup> zi, k <sup>þ</sup> <sup>∑</sup><sup>n</sup>

wi <sup>∗</sup> zi, k <sup>þ</sup> <sup>∑</sup><sup>n</sup>

(2)

(3)

(4)

(5)

#### Figure 5.

Text fragment combined with a double scale for probability assessment.

On the left side is a text fragment describing the conditional probability to be assessed. On the right side, we have the double scale proposed in [19]. The text fragment is stated in terms of likelihood rather than frequency which circumvents the need for mathematical notation of the conditional probability. According to [21], the frequency format has been reported to be less liable to lead to biases and experts may experience considerable difficulty understanding conditional probabilities in mathematical notation. Conversely, such an approach may be less intuitive for domains in which it is difficult to imagine 100 occurrences of a rare event.

Nonetheless, in [20], the fragments of text and associated scales are grouped up accordingly to the conditional probability distribution. In so doing, domain experts can assess probabilities from the same conditional probability distribution simultaneously. In other words, the centering effect is avoided by presenting all the related probabilities (i.e., from the same probability distribution) at once for the expert to assess. This approach considerable reduces the number of mental changes during the probability elicitation process. In regards to the spacing effect, the proposed method avoids it by using a separated scale for each probability.

#### 5. Semiautomatic methods

In this section, we present three methods to generate NPT that ease the burden for experts during the quantification process of a BN. These methods allow the construction of large-scale BN. The first is the RNM, which completely eliminates the need for direct probability assessment. The second is the WSA, which is based on two heuristics and needs only part of the NPT to be elicited from the expert. The third is an adaptation of the analytic hierarchy process (AHP) which reduces the cognitive effort, biases and inaccuracies from estimating probabilities to all combinations of states of multiple parents at a time. From now on, we will reference the latter as simply AHP. These three methods attack the magnitude problem of building NPT.

#### 5.1 RNM

In [1], the ranked nodes method (RNM) is presented. In this chapter, it is introduced the concept of ranked nodes, ordinal random variables represented on a Issues in the Probability Elicitation Process of Expert-Based Bayesian Networks DOI: http://dx.doi.org/10.5772/intechopen.81602

Figure 6. Examples of TNormal.

On the left side is a text fragment describing the conditional probability to be assessed. On the right side, we have the double scale proposed in [19]. The text fragment is stated in terms of likelihood rather than frequency which circumvents the need for mathematical notation of the conditional probability. According to [21], the frequency format has been reported to be less liable to lead to biases and experts may experience considerable difficulty understanding conditional probabilities in mathematical notation. Conversely, such an approach may be less intuitive for domains in which it is difficult to imagine 100 occurrences of a rare event. Nonetheless, in [20], the fragments of text and associated scales are grouped up accordingly to the conditional probability distribution. In so doing, domain experts can assess probabilities from the same conditional probability distribution simultaneously. In other words, the centering effect is avoided by presenting all the related probabilities (i.e., from the same probability distribution) at once for the expert to assess. This approach considerable reduces the number of mental changes during the probability elicitation process. In regards to the spacing effect, the proposed

In this section, we present three methods to generate NPT that ease the burden for experts during the quantification process of a BN. These methods allow the construction of large-scale BN. The first is the RNM, which completely eliminates the need for direct probability assessment. The second is the WSA, which is based on two heuristics and needs only part of the NPT to be elicited from the expert. The third is an adaptation of the analytic hierarchy process (AHP) which reduces the cognitive effort, biases and inaccuracies from estimating probabilities to all combinations of states of multiple parents at a time. From now on, we will reference the latter as simply AHP. These three methods attack the magnitude problem of build-

In [1], the ranked nodes method (RNM) is presented. In this chapter, it is introduced the concept of ranked nodes, ordinal random variables represented on a

method avoids it by using a separated scale for each probability.

Text fragment combined with a double scale for probability assessment.

5. Semiautomatic methods

ing NPT.

Figure 5.

Enhanced Expert Systems

5.1 RNM

14

continuous scale ordered monotonically in the interval [0, 1]. For example, for the ordinal scale ["Low", "Medium", "High"], "Low" is represented by the interval [0, 1/3], "Medium", by the interval [1/3, 2/3], and "High", by the interval [2/3, 1]. This concept is based on the doubly truncated Normal (TNormal) distribution.

A normal distribution is made of four parameters: μ, mean (i.e., central tendency); σ2, variance (i.e., uncertainty about the central tendency); a, the lower bound (i.e., 0); and, b, upper bound (i.e., 1). With a normal distribution, it is possible to model a variety of curves (i.e., relationships) as a uniform distribution, achieved when <sup>σ</sup><sup>2</sup> <sup>¼</sup> <sup>∞</sup>, and highly skewed distributions, achieved when <sup>σ</sup><sup>2</sup> <sup>¼</sup> 0. In Figure 6, we show an example of TNormal with same μ but different σ2.

In this method, μ is defined by a weighted function of the parent nodes. There are four function types: weighted mean (WMEAN) Eq. (2), weighted minimum (WMIN) Eq. (3), weighted maximum (WMAX) Eq. (4) and a mix of both MIN and MAX functions (MIXMINMAX) Eq. (5). In practice, these functions define the central tendency of the child node for the combination of parent nodes states. The weight of each parent node, which quantifies the relative strengths of the influences of the parents on the child node, must be defined by a constant w in which w ∈ ℕ.

$$\text{WMEAN}(z\_1, k, \dots, z\_n, k, w\_1, \dots, w\_n) = \frac{\sum\_{i=1}^n w\_i \* z\_i, k}{\sum\_{i=1}^n w\_i} \tag{2}$$

$$\text{WMIIN}(z\_1, k, \dots, z\_n, k, w\_1, \dots, w\_n) = \min\_{i = 1, \dots, n} \left\{ \frac{w\_i \* z\_i, k + \sum\_{j \neq 1}^n z\_j, k}{w\_i + n - 1} \right\} \tag{3}$$

$$\text{WMAX}(z\_1, k, \dots, z\_n, k, w\_1, \dots, w\_n) = \max\_{i = 1, \dots, n} \left\{ \frac{w\_i \* z\_i, k + \sum\_{j \neq 1}^n z\_j, k}{w\_i + n - 1} \right\} \tag{4}$$

$$\text{MIXMINMAX}(\mathbf{z}\_1, k, \dots, \mathbf{z}\_n, k, \textit{wmin}, \textit{u} \text{max}) = \frac{\text{WMIN} \ast\_{i=1,\dots,n} \{\mathbf{z}\_i, k\} + \text{WMAX} \ast\_{i=1,\dots,n} \{\mathbf{z}\_i, k\}}{\text{WMIN} + \text{WMAX}} \tag{5}$$

Fenton et al. [1] do not present the details to, in practice, implement the solution. Despite presenting the mixture functions, there is no information regarding the algorithms used to generate and mix TNormal, define samples size and define a conventional NPT given the calculated TNormals. The latter enables the integration of ranked nodes with other types of nodes such as Boolean and continuous, which brings more modeling flexibility.

In [22], it is proposed a probabilistic algorithm for this purpose, composed of two main steps: (i) generate samples for the parent nodes and

Figure 7. Overview of the algorithm.

(ii) construct the NPT. In step (ii), for each possible combination of values for the parent nodes (i.e., each column of the NPT), the samples defined in the previous step are mixed given a function selected by the user and a TNormal is generated using the resulting mix and a variance defined by the user. An overview of the algorithm is shown in Figure 7.

As already mentioned, a ranked node is conceptually represented by an ordinal scale, which is mapped to the continuous interval [0, 1]. Thus, it is represented as a set of uniform distributions. For an ordinal scale with three values (e.g., "Bad", "Moderate" and "Good"): U ð Þ¼ 0; 1 pbad ∗ U ð Þ 0; 1=3 ∪ pmoderate ∗

is registered with meta-data regarding its configuration (i.e., number of states and μ). The data in repository½� is used to generate samples for a node. Therefore, the samples for a base state are only generated once and reused later. The next step consists of, for each combination of the parent nodes, mix the TNormal using equidistant samples, randomly selected for each parent node. The samples are mixed using one of the given functions (e.g., WMEAN, WMIN, WMAX or

Issues in the Probability Elicitation Process of Expert-Based Bayesian Networks

DOI: http://dx.doi.org/10.5772/intechopen.81602

To mix the distributions, a random element from each sample of the parents is

Afterwards, the set of calculated elements and the given σ are used as input to generate a TNormal. The resulting distribution is converted to an ordinal scale and represents a column in the NPT of the child node (i.e., in the given example, the column for the combination "Low"-"High"). At the end of this step, all the possible combinations of states of the parent nodes are evaluated and the NPT for the child

Accordingly, the inputs to generate the NPT of a child node are: a weighted expression capable of generating curves equivalent to distributions expected by the experts; a set of weights of the parent nodes; and a value for σ2. To determine the weighted expression one can ask the experts to assess the mode of the child node for different combinations of the extreme states of the parent nodes [23]. For instance, let us consider the Bayesian network shown in Figure 8 along with the mode

removed and used to calculate a resulting element using a given function. For instance, consider node A with two parents B and C. If we are calculating the probabilities of A for the combination "Low"-"High" and the selected function is WMEAN with equal weights, if the values removed in an iteration were 0.1 and 0.7, the resulting value would be 0.4. This step must be repeated until the collections of

MIXMINMAX) and the defined variance.

Conversion from ordinal to continuous scale.

assessments of the experts in Table 1.

samples are empty.

Figure 8.

node is completed.

17

U ð Þ 1=3; 2=3 ∪ pgood ∗ U ð Þ 2=3; 1 , where p is the density of the distribution. For the example shown in Figure 8, the set of uniform distributions is composed of the union of three uniform distributions: U ð Þ¼ 0; 1 54:7 ∗ U ð Þ 0; 1=3 ∪ 36:5 ∗ U ð Þ 1=3; 2=3 ∪ 8:80 ∗ U ð Þ 2=3; 1 . Numerically, this union is calculated using samples. Considering a sample size of 10,000, to represent the NPT of the example shown in Figure 8, it is necessary to collect 5700 random samples from U ð Þ 0; 1=3 , 3650 random samples from U ð Þ 1=3; 2=3 and 880 random samples from U ð Þ 2=3; 1 .

Figure 7 shows that the algorithm is composed of four collections: repository½�, a vector to store the samples of base states for the parent nodes; parents k½ �, a vector to store references to the parent nodes of each child node, in which k is the number of parents; states m½ �, a vector to store the states of each node, in which m is the number of possible values for the child node given the combination states of its parents; and distribution m½ �, a vector to store the resulting distribution for each possible combination of states of the parent nodes.

The repository strategy is used for optimization purposes. First, it is registered in memory (i.e., in repository½�) distributions that represent the base states, which are states with hard evidence (i.e., a node has 100% of chance for a given state). For instance, for a node composed of the states ["Bad", "Moderate", "Good"], are registered samples for: 100% "Bad", 100% "Moderate" and 100% "Good", which respectively has μ ¼ 1=6, μ ¼ 1=2 and μ ¼ 5=6. For this purpose, samples from a uniform distribution with the limits defined given the thresholds of the scale are collected.

For instance, for 100% "Good", it is collected samples of a uniform distribution limited in the interval [2/3, 1]. In [22] it is empirically defined that using a sample size of 10,000 is enough to guarantee a margin of error less than 0.1%. Each sample Issues in the Probability Elicitation Process of Expert-Based Bayesian Networks DOI: http://dx.doi.org/10.5772/intechopen.81602

Figure 8. Conversion from ordinal to continuous scale.

(ii) construct the NPT. In step (ii), for each possible combination of values for the parent nodes (i.e., each column of the NPT), the samples defined in the previous step are mixed given a function selected by the user and a TNormal is generated using the resulting mix and a variance defined by the user. An

As already mentioned, a ranked node is conceptually represented by an ordinal scale, which is mapped to the continuous interval [0, 1]. Thus, it is represented as a set of uniform distributions. For an ordinal scale with three values (e.g., "Bad",

For the example shown in Figure 8, the set of uniform distributions is composed

Figure 7 shows that the algorithm is composed of four collections: repository½�, a vector to store the samples of base states for the parent nodes; parents k½ �, a vector to store references to the parent nodes of each child node, in which k is the number of parents; states m½ �, a vector to store the states of each node, in which m is the number of possible values for the child node given the combination states of its parents; and distribution m½ �, a vector to store the resulting distribution for each

The repository strategy is used for optimization purposes. First, it is registered in memory (i.e., in repository½�) distributions that represent the base states, which are states with hard evidence (i.e., a node has 100% of chance for a given state). For instance, for a node composed of the states ["Bad", "Moderate", "Good"], are registered samples for: 100% "Bad", 100% "Moderate" and 100% "Good", which respectively has μ ¼ 1=6, μ ¼ 1=2 and μ ¼ 5=6. For this purpose, samples from a uniform distribution with the limits defined given the thresholds of the scale are

For instance, for 100% "Good", it is collected samples of a uniform distribution limited in the interval [2/3, 1]. In [22] it is empirically defined that using a sample size of 10,000 is enough to guarantee a margin of error less than 0.1%. Each sample

of the union of three uniform distributions: U ð Þ¼ 0; 1 54:7 ∗ U ð Þ 0; 1=3 ∪ 36:5 ∗ U ð Þ 1=3; 2=3 ∪ 8:80 ∗ U ð Þ 2=3; 1 . Numerically, this union is calculated using samples. Considering a sample size of 10,000, to represent the NPT of the example shown in Figure 8, it is necessary to collect 5700 random samples from U ð Þ 0; 1=3 , 3650 random samples from U ð Þ 1=3; 2=3 and 880 random samples from U ð Þ 2=3; 1 .

"Moderate" and "Good"): U ð Þ¼ 0; 1 pbad ∗ U ð Þ 0; 1=3 ∪ pmoderate ∗ U ð Þ 1=3; 2=3 ∪ pgood ∗ U ð Þ 2=3; 1 , where p is the density of the distribution.

overview of the algorithm is shown in Figure 7.

possible combination of states of the parent nodes.

collected.

16

Figure 7.

Overview of the algorithm.

Enhanced Expert Systems

is registered with meta-data regarding its configuration (i.e., number of states and μ). The data in repository½� is used to generate samples for a node. Therefore, the samples for a base state are only generated once and reused later. The next step consists of, for each combination of the parent nodes, mix the TNormal using equidistant samples, randomly selected for each parent node. The samples are mixed using one of the given functions (e.g., WMEAN, WMIN, WMAX or MIXMINMAX) and the defined variance.

To mix the distributions, a random element from each sample of the parents is removed and used to calculate a resulting element using a given function. For instance, consider node A with two parents B and C. If we are calculating the probabilities of A for the combination "Low"-"High" and the selected function is WMEAN with equal weights, if the values removed in an iteration were 0.1 and 0.7, the resulting value would be 0.4. This step must be repeated until the collections of samples are empty.

Afterwards, the set of calculated elements and the given σ are used as input to generate a TNormal. The resulting distribution is converted to an ordinal scale and represents a column in the NPT of the child node (i.e., in the given example, the column for the combination "Low"-"High"). At the end of this step, all the possible combinations of states of the parent nodes are evaluated and the NPT for the child node is completed.

Accordingly, the inputs to generate the NPT of a child node are: a weighted expression capable of generating curves equivalent to distributions expected by the experts; a set of weights of the parent nodes; and a value for σ2. To determine the weighted expression one can ask the experts to assess the mode of the child node for different combinations of the extreme states of the parent nodes [23]. For instance, let us consider the Bayesian network shown in Figure 8 along with the mode assessments of the experts in Table 1.


software projects [24], software quality forecasting [25], air traffic control [26]

Issues in the Probability Elicitation Process of Expert-Based Bayesian Networks

In [8] the WSA method is proposed. This work introduced the concept of compatible parental configuration. The availability heuristic and the simulation heuristic are the base for this concept. As previously stated, the availability heuristic operates under the assumption that is easier to remember events that are more likely to occur. The simulation heuristic, in turn, operates according to which people determine the probability of an event based on how easy it is to simulate it

To formally define the concept of compatible parental configuration, we take as a basis the work of [28]. Superscript is used to represent the states of a node and subscript to differentiate the parent-nodes. Therefore, consider that for Yi is

<sup>i</sup> , since Yj is another parent node, such

� o (6)

<sup>j</sup> which is

<sup>i</sup> . Hence, we

<sup>i</sup> only when Yj is in the state y<sup>w</sup>

P y<sup>w</sup> <sup>j</sup> <sup>j</sup> <sup>y</sup><sup>v</sup> iÞ

� �, which, theoretically, are easier to

�Comp Yj ¼ y

vj j

� � (7)

� � to represent the set of states that are compatible

maxw¼1…j j Yj

<sup>i</sup> , that is, Yi <sup>¼</sup> <sup>y</sup><sup>v</sup>

most likely, according to the expert's knowledge, to coexist with Yi <sup>¼</sup> yv

<sup>j</sup> ; ∀j 6¼ i n �

simulate and therefore, prone to more realistic probabilities. Hence, it is elicited from experts the probability distributions for all compatible parental configuration and relative weights. The NPT is calculated using a weighted sum algorithm [8] which takes these probability distributions and weights as input. The input data of

1. relative weight (between zero and one) for each parent node, denoting its

n i¼1

For instance, let us consider the Bayesian network shown in Figure 2 where we wish to assess teamwork. For the sake of simplicity let us say that all the parents have the states "Low", "Medium" and "High" instead of the five states from the original example. With WSA 3 � 3 distributions are needed to construct a complete NPT against 3<sup>3</sup> in case of manual elicitation. Starting with the parent Y1, let us say

wjp x<sup>l</sup> � �

2. k<sup>1</sup> þ … þ kn probability distributions of X for compatible parental

where w is the relative weight of the parent node, l ¼ 0, 1, …, m and vj ¼ 1, 2, …, kj. A constraint must be observed: the sum of all the relative weights (i.e., of all parent nodes) must be exactly one. A weight equal to zero indicates that the parent node has no influence on the child node and therefore can be omitted from the relation. Conversely, a relative weight equal to one indicates that the parent node is the only determinant of the conditional probabilities on the child

the algorithm is obtained from the experts' knowledge, as follows:

degree of influence on the child node w1, …, wn;

<sup>2</sup> , …, yvn n � � <sup>¼</sup> <sup>∑</sup> � �

The compatible parental configurations are captured during the elicitation process by asking the domain experts to choose off the top of their head a plausible

i

i

i � � <sup>¼</sup> <sup>y</sup><sup>w</sup>

and operational management [27].

DOI: http://dx.doi.org/10.5772/intechopen.81602

assigned an arbitrary state y<sup>v</sup>

use the notation Comp Yi <sup>¼</sup> <sup>y</sup><sup>v</sup>

configurations.

node.

19

p x<sup>l</sup> jy v1 <sup>1</sup> , y<sup>v</sup><sup>2</sup>

that Yj is considered compatible with Yi <sup>¼</sup> <sup>y</sup><sup>v</sup>

<sup>i</sup> for all parent nodes.

Comp Yi <sup>¼</sup> <sup>y</sup><sup>v</sup>

combination of states for each Comp Yi <sup>¼</sup> <sup>y</sup><sup>v</sup>

5.2 WSA

mentally.

with Yi <sup>¼</sup> <sup>y</sup><sup>v</sup>

#### Table 1.

Mode assessments for teamwork, checkmarks indicate the mode assessment of the expert.

First, let us consider the rows 1 and 4, where all the parent nodes are in the highest and lowest states respectively. As can be seen in Table 1, when o all the parent nodes are in the lowest or highest states, the mode of the child node is also the lowest or highest state. Such a probability distribution can be obtained by any of the weighted expressions.

Now, let us consider the row 1 as the initial state, rows 2, 6 and 8 indicate that when the state of a single parent node shifts from lowest to highest state the mode of the child node shifts towards the highest state. Similarly, consider row 4 as the initial state, rows 3, 5 and 7 indicate that when the state of a single parent node shifts from highest to lowest state, the mode of the child node also shifts towards the lowest state.

However, it is quite clear that the shift effect is stronger when it occurs from the lowest to highest state. Hence, Table 1 reveals that the mode of the child node is inclined to go more towards the highest than lowest states which makes the WMAX function more suitable to express the distribution expected by the experts.

The process to determine the weights of the parent nodes and the variance parameter is not as straightforward as to determine the weighted expression. There is no guideline in the literature, as far as we know, to aid in this task. Nonetheless, one can use the mode assessments in Table 1 as a starting point to define the weight of the parent nodes. For instance, considering WMAX is the most suitable function to express the probability distribution, let us examine the rows in which the states shift from lowest to highest in Table 1.

Finally, let us consider row 1 as the initial state, rows 2, 6 and 8 indicate that the parent nodes have different strengths of influence on the child node. That is, when the parent node Autonomy shifts from lowest to highest state the mode of Teamwork slightly shifts towards highest states, however, the shift is higher when the state changes in the parent node Collaboration, as can be seen, if one compares rows 2 and 6. A similar effect is observed when comparing rows 6 and 8. Hence, it is derived from Table 1 the following constraint: Autonomy weight <Collaboration weight <Cohesion weight. Nevertheless, trial and error are yet necessary to discover suitable values for the weights and the variance parameter.

This method solves the magnitude problem of constructing NPT in complex Bayesian networks. On the other hand, a drawback to this method is that the domain context needs to fit a pattern that can be modeled by one of the weighted expressions. This solution has been validated through case studies in different real-world domains, such as human resources management in

software projects [24], software quality forecasting [25], air traffic control [26] and operational management [27].

## 5.2 WSA

First, let us consider the rows 1 and 4, where all the parent nodes are in the highest and lowest states respectively. As can be seen in Table 1, when o all the parent nodes are in the lowest or highest states, the mode of the child node is also the lowest or highest state. Such a probability distribution can be obtained by any of

Autonomy Cohesion Collaboration VL L M H VH

Row Parents Teamwork

6 VH VL VL X

Mode assessments for teamwork, checkmarks indicate the mode assessment of the expert.

2 VL VL VH X

3 VL VH VH X 4 VH VH VH X 5 VH VH VL X

7 VH VL VH X 8 VL VH VL X

1 VL VL VL X

Now, let us consider the row 1 as the initial state, rows 2, 6 and 8 indicate that when the state of a single parent node shifts from lowest to highest state the mode of the child node shifts towards the highest state. Similarly, consider row 4 as the initial state, rows 3, 5 and 7 indicate that when the state of a single parent node shifts from highest to lowest state, the mode of the child node also shifts towards

However, it is quite clear that the shift effect is stronger when it occurs from the lowest to highest state. Hence, Table 1 reveals that the mode of the child node is inclined to go more towards the highest than lowest states which makes the WMAX

Finally, let us consider row 1 as the initial state, rows 2, 6 and 8 indicate that the parent nodes have different strengths of influence on the child node. That is, when the parent node Autonomy shifts from lowest to highest state the mode of Teamwork slightly shifts towards highest states, however, the shift is higher when the state changes in the parent node Collaboration, as can be seen, if one compares rows 2 and 6. A similar effect is observed when comparing rows 6 and 8. Hence, it is derived from Table 1 the following constraint: Autonomy weight <Collaboration weight <Cohesion weight. Nevertheless, trial and error are yet necessary to discover

This method solves the magnitude problem of constructing NPT in complex Bayesian networks. On the other hand, a drawback to this method is that the domain context needs to fit a pattern that can be modeled by one of the weighted expressions. This solution has been validated through case studies in

different real-world domains, such as human resources management in

function more suitable to express the distribution expected by the experts. The process to determine the weights of the parent nodes and the variance parameter is not as straightforward as to determine the weighted expression. There is no guideline in the literature, as far as we know, to aid in this task. Nonetheless, one can use the mode assessments in Table 1 as a starting point to define the weight of the parent nodes. For instance, considering WMAX is the most suitable function to express the probability distribution, let us examine the rows in which the states

the weighted expressions.

Enhanced Expert Systems

shift from lowest to highest in Table 1.

suitable values for the weights and the variance parameter.

the lowest state.

18

Table 1.

In [8] the WSA method is proposed. This work introduced the concept of compatible parental configuration. The availability heuristic and the simulation heuristic are the base for this concept. As previously stated, the availability heuristic operates under the assumption that is easier to remember events that are more likely to occur. The simulation heuristic, in turn, operates according to which people determine the probability of an event based on how easy it is to simulate it mentally.

To formally define the concept of compatible parental configuration, we take as a basis the work of [28]. Superscript is used to represent the states of a node and subscript to differentiate the parent-nodes. Therefore, consider that for Yi is assigned an arbitrary state y<sup>v</sup> <sup>i</sup> , that is, Yi <sup>¼</sup> <sup>y</sup><sup>v</sup> <sup>i</sup> , since Yj is another parent node, such that Yj is considered compatible with Yi <sup>¼</sup> <sup>y</sup><sup>v</sup> <sup>i</sup> only when Yj is in the state y<sup>w</sup> <sup>j</sup> which is most likely, according to the expert's knowledge, to coexist with Yi <sup>¼</sup> yv <sup>i</sup> . Hence, we use the notation Comp Yi <sup>¼</sup> <sup>y</sup><sup>v</sup> i � � to represent the set of states that are compatible with Yi <sup>¼</sup> <sup>y</sup><sup>v</sup> <sup>i</sup> for all parent nodes.

$$\operatorname{Comp}\left[Y\_i = \mathbf{y}\_i^\nu\right] = \left\{ \mathbf{y}\_j^w, \forall j \neq i \; \middle|\; \max\_{\mathbf{w} = \mathbf{1} \dots \left| \mathbf{Y}\_i \right|} \mathbf{P} \left( \mathbf{y}\_j^w | \mathbf{y}\_i^\nu \right) \right\} \tag{6}$$

The compatible parental configurations are captured during the elicitation process by asking the domain experts to choose off the top of their head a plausible combination of states for each Comp Yi <sup>¼</sup> <sup>y</sup><sup>v</sup> i � �, which, theoretically, are easier to simulate and therefore, prone to more realistic probabilities. Hence, it is elicited from experts the probability distributions for all compatible parental configuration and relative weights. The NPT is calculated using a weighted sum algorithm [8] which takes these probability distributions and weights as input. The input data of the algorithm is obtained from the experts' knowledge, as follows:


$$p\left(\mathbf{x}^{l}|\mathbf{y}\_{1}^{v\_{1}},\mathbf{y}\_{2}^{v\_{2}},...,\mathbf{y}\_{n}^{v\_{n}}\right) = \sum\_{i=1}^{n} w\_{i} p\left(\mathbf{x}^{l}|Comp\left(\mathbf{Y}\_{j} = \mathbf{y}\_{j}^{v\_{j}}\right)\right) \tag{7}$$

where w is the relative weight of the parent node, l ¼ 0, 1, …, m and vj ¼ 1, 2, …, kj. A constraint must be observed: the sum of all the relative weights (i.e., of all parent nodes) must be exactly one. A weight equal to zero indicates that the parent node has no influence on the child node and therefore can be omitted from the relation. Conversely, a relative weight equal to one indicates that the parent node is the only determinant of the conditional probabilities on the child node.

For instance, let us consider the Bayesian network shown in Figure 2 where we wish to assess teamwork. For the sake of simplicity let us say that all the parents have the states "Low", "Medium" and "High" instead of the five states from the original example. With WSA 3 � 3 distributions are needed to construct a complete NPT against 3<sup>3</sup> in case of manual elicitation. Starting with the parent Y1, let us say

that the domain expert subjectively interprets the compatible parental configurations as an equivalence relation as follows:

$$\{Comp(Y\_1 = s)\} \equiv \{Comp(Y\_2 = s)\} \equiv \{Comp(Y\_3 = s)\}, \text{for } s = l, m, h \tag{8}$$

When the domain expert provides 3 probability distributions over the node Y<sup>1</sup> then all 3 � 3 distributions for compatible parental configurations are obtained. To generate the NPT, the expert must assign relative weights to the parents to quantify the relative strengths of their influence on the child node. Let us say that the expert interprets Autonomy and Cohesion as having the same influence strength on the child node, and Collaboration as three times more important than Cohesion or Autonomy. Hence, assigning the following weights: w<sup>1</sup> ¼ :2, w<sup>2</sup> ¼ :2, w<sup>3</sup> ¼ :6.

With the weights and 3 probability distributions over the node Y<sup>1</sup> as inputs, the weighted sum algorithm calculates all the 3<sup>3</sup> distributions required to populate the NPT. On the other hand, let us say that Eq. (8) is not satisfied, then all the 3 � 3 probability distributions must be elicited.

In such a case, the probability of Teamwork (X) = "Low" conditioned to Autonomy (Y1) = "Low", Cohesion (Y2) = "Medium", and Collaboration (Y3) = "High" would be given by:

For a better understanding of the method, we substitute the original terminology used in the AHP for terms more appropriate to the probability context. Thus, the term attribute is replaced by event and the term importance is replaced by likelihood. To obtain prior probabilities pairwise comparisons of all states of the node are performed. Since each state is compared to every other state we can assemble a comparison matrix. In Figure 9 we see an example of a comparison matrix used to

In the above matrix, aijð Þ i ¼ 1; 2; …; n; j ¼ 1; 2; …; n is specified by the question

we have filled the values for aij we can find the values of aji by calculating the inverse of aij, i.e., 1=aij. The final result is a reciprocal matrix with all elements in the

The relative priority of xsi is obtained from the maximum eigenvector

consistency ratio CR ¼ CI=RI, where CI is the consistency index, defined by (λmax � nÞ=ð Þ n � 1 where λmax is the maximum eigenvalue corresponding to ω, and RI is the random index given by Table 3. A comparison matrix with CR less than 0.10 is considered acceptable [11]. Although [31] has observed that this threshold may be inappropriate for the purpose of evaluating probabilities. Since the sum of

n 12 3 4 5 6 7 8 9 RI 0 0 0.58 0.90 1.12 1.24 1.32 1.41 1.45

, which is more likely and how more likely?". Once

nxn and the consistency of the matrix is the

define prior probabilities of a node.

diagonal equal to 1, that is, aii ¼ 1 for all i.

Comparison matrix for prior probability elicitation of a node X.

Random consistency index where n is the number of states.

Scale Definition Explanation

1 Equal likely Event A and evet B are equal likely

Issues in the Probability Elicitation Process of Expert-Based Bayesian Networks

3 Moderate more likely Event A is moderate more likely than event B

5 Strong more likely Event A is Strong more likely than event B

7 Very strong more likely Event A is very strong more likely than event B

9 Extremely more likely Event A is extremely more likely than event B

<sup>ω</sup> <sup>¼</sup> ð Þ <sup>ω</sup>1;ω2,…;ω<sup>n</sup> <sup>T</sup> of the matrix aij

"comparing the state xsi with xsj

2 Weak or slight

DOI: http://dx.doi.org/10.5772/intechopen.81602

4 Moderate plus

6 Strong plus

Scale for the pairwise comparisons.

Table 2.

Figure 9.

Table 3.

21

8 Very, very strong

$$\begin{aligned} p(X=l|Y\_1=l, Y\_2=m, Y\_3=h) &= w\_1 p(X=l|\{Comp(Y\_1=l)\} \\ +w\_2 p(X=l|\{Comp(Y\_2=m)\} \\ +w\_3 p(X=l|\{Comp(Y\_3=h)\} \end{aligned} \tag{9}$$

This summarizes the WSA method, for an in-depth description please check [8]. Unfortunately, [8] do not describe how to deal with situations where the expert cannot select a single compatible parental configuration. Hence, an extension to this method is proposed by [29] to deal with such situations by averaging the probabilities of valid compatible parental configurations that experts might select.

#### 5.3 AHP

Although the direct assessment of probabilities in the construction of NPT is feasible for small Bayesian networks and relatively simple domains, for medium to large networks the complexity and burden for experts grows substantially. As the number of parents and states increase, the more difficult it becomes for experts to reason about conditional probabilities with multiple parents and multiple combinations of states at once, and the more susceptible it becomes to biases and inaccuracies [11].

In [11] it is proposed a systematic approach for generating conditional probabilities of nodes with multiple parents. It is an adaptation of the AHP method for the task of probability elicitation and semiautomatic generation of NPT where the expert only needs to provide probability assessments (i.e., indirect) conditioned on single parents. In this approach, the probability assessments are extracted from pairwise judgments of the states. The NPT is generated through the product of the probabilities of the child node conditioned on single parents.

Before using the proposed method [11] it is required to define an agreed upon scale to perform the pairwise judgments over the states of the node. Saaty's scale [30] can be used for this purpose or a custom one can be created. A good example of how to obtain a scale can be consulted in [19] in which four successive experiments were performed to generate a scale with numbers and words. The Saaty's scale has nine values as seen in Table 2.

Issues in the Probability Elicitation Process of Expert-Based Bayesian Networks DOI: http://dx.doi.org/10.5772/intechopen.81602


#### Table 2.

that the domain expert subjectively interprets the compatible parental configura-

f g Comp Yð Þ <sup>1</sup> ¼ s � f g Comp Yð Þ <sup>2</sup> ¼ s � f g Comp Yð Þ <sup>3</sup> ¼ s ,for s ¼ l, m, h (8)

When the domain expert provides 3 probability distributions over the node Y<sup>1</sup> then all 3 � 3 distributions for compatible parental configurations are obtained. To generate the NPT, the expert must assign relative weights to the parents to quantify the relative strengths of their influence on the child node. Let us say that the expert interprets Autonomy and Cohesion as having the same influence strength on the child node, and Collaboration as three times more important than Cohesion or Autonomy.

With the weights and 3 probability distributions over the node Y<sup>1</sup> as inputs, the weighted sum algorithm calculates all the 3<sup>3</sup> distributions required to populate the NPT. On the other hand, let us say that Eq. (8) is not satisfied, then all the 3 � 3

In such a case, the probability of Teamwork (X) = "Low" conditioned to Autonomy (Y1) = "Low", Cohesion (Y2) = "Medium", and Collaboration (Y3) = "High" would

This summarizes the WSA method, for an in-depth description please check [8]. Unfortunately, [8] do not describe how to deal with situations where the expert cannot select a single compatible parental configuration. Hence, an extension to this method is proposed by [29] to deal with such situations by averaging the probabil-

Although the direct assessment of probabilities in the construction of NPT is feasible for small Bayesian networks and relatively simple domains, for medium to large networks the complexity and burden for experts grows substantially. As the number of parents and states increase, the more difficult it becomes for experts to reason about conditional probabilities with multiple parents and multiple combinations of states at once, and the more susceptible it becomes to biases and inaccura-

In [11] it is proposed a systematic approach for generating conditional probabilities of nodes with multiple parents. It is an adaptation of the AHP method for the task of probability elicitation and semiautomatic generation of NPT where the expert only needs to provide probability assessments (i.e., indirect) conditioned on single parents. In this approach, the probability assessments are extracted from pairwise judgments of the states. The NPT is generated through the product of the

Before using the proposed method [11] it is required to define an agreed upon scale to perform the pairwise judgments over the states of the node. Saaty's scale [30] can be used for this purpose or a custom one can be created. A good example of how to obtain a scale can be consulted in [19] in which four successive experiments were performed to generate a scale with numbers and words. The Saaty's scale has

probabilities of the child node conditioned on single parents.

þw2p Xð ¼ l∣f g Comp Yð Þ <sup>2</sup> ¼ m þw3p Xð ¼ l∣f g Comp Yð Þ <sup>3</sup> ¼ h

(9)

p Xð ¼ ljY<sup>1</sup> ¼ l, Y<sup>2</sup> ¼ m, Y<sup>3</sup> ¼ hÞ ¼ w1p Xð ¼ l∣f g Comp Yð Þ <sup>1</sup> ¼ l

ities of valid compatible parental configurations that experts might select.

Hence, assigning the following weights: w<sup>1</sup> ¼ :2, w<sup>2</sup> ¼ :2, w<sup>3</sup> ¼ :6.

tions as an equivalence relation as follows:

Enhanced Expert Systems

probability distributions must be elicited.

be given by:

5.3 AHP

cies [11].

20

nine values as seen in Table 2.

Scale for the pairwise comparisons.

For a better understanding of the method, we substitute the original terminology used in the AHP for terms more appropriate to the probability context. Thus, the term attribute is replaced by event and the term importance is replaced by likelihood. To obtain prior probabilities pairwise comparisons of all states of the node are performed. Since each state is compared to every other state we can assemble a comparison matrix. In Figure 9 we see an example of a comparison matrix used to define prior probabilities of a node.

In the above matrix, aijð Þ i ¼ 1; 2; …; n; j ¼ 1; 2; …; n is specified by the question "comparing the state xsi with xsj , which is more likely and how more likely?". Once we have filled the values for aij we can find the values of aji by calculating the inverse of aij, i.e., 1=aij. The final result is a reciprocal matrix with all elements in the diagonal equal to 1, that is, aii ¼ 1 for all i.

The relative priority of xsi is obtained from the maximum eigenvector <sup>ω</sup> <sup>¼</sup> ð Þ <sup>ω</sup>1;ω2,…;ω<sup>n</sup> <sup>T</sup> of the matrix aij nxn and the consistency of the matrix is the consistency ratio CR ¼ CI=RI, where CI is the consistency index, defined by (λmax � nÞ=ð Þ n � 1 where λmax is the maximum eigenvalue corresponding to ω, and RI is the random index given by Table 3. A comparison matrix with CR less than 0.10 is considered acceptable [11]. Although [31] has observed that this threshold may be inappropriate for the purpose of evaluating probabilities. Since the sum of


#### Figure 9.

Comparison matrix for prior probability elicitation of a node X.


#### Table 3.

Random consistency index where n is the number of states.

all elements in ω is 1 and the ith element ω<sup>i</sup> represents the relative importance of the state xsi , ω<sup>i</sup> is now interpreted as the prior probability of the state xsi , that is, P xsi ð Þ¼ <sup>ω</sup>i.

In [31] a similar method is proposed, also based on the AHP, which allow the quantitative evaluation of the inconsistency of experts in the task of probability assessment. The difference of the proposed methods is that in [11] the magnitude problem to construct NPT is reduced with a semiautomatic approach for the generation of the NPT and the cognitive effort is reduced because the experts only need to evaluate, indirectly, probabilistic distributions conditioned on a single parent at a time, whereas in [31] the effort is even greater than the direct elicitation of probabilities. Nonetheless, it is our understanding that the method proposed in [31] can somewhat extend other methods such as the WSA, without causing too much

Despite recent popularity, the construction of BN is still a challenging task. One of the main obstacles refers to defining the NPT for large-scale BN. It is possible to automate this process using batch learning, but it requires a database with enough information. In practice, this is not common. The other option is to elicit data from experts, which is unfeasible in most cases due to the number of probabilities required. A third option is to use semiautomatic methods that given an input (i.e.,

In this chapter, we present three semiautomatic methods, found in an exploratory study through a literature review. These methods help, to a certain extent, to minimize the effects of human biases by reducing the parameters that are required to construct complete NPT. However, these methods are highly reliable on the input data elicited from experts. Therefore, flawed input necessarily produces nonsense output. For this reason, we present one of many probability elicitation techniques as an example, which can improve the input data needed by the semiautomatic

The biggest problem with elaborated probability elicitation techniques is undoubtedly its cost, which is often greater than the direct elicitation of probabilities. Thus, these methods are not well suited for the construction of large-scale BN, despite been useful to deal with well know biases. However, it is our understanding that the cost to use elaborated probability elicitation techniques is drastically reduced when only is needed to elicit a small fraction of data of what would be necessary for manual definition of NPT. Therefore, the combination of semiautomatic methods and elaborated probability elicitation techniques might help building

For example, let us consider the WSA method that uses a partial elicited NPT to generate a complete one using the concept of compatible parental configurations, weights of the parents and a weighted sum algorithm. Once the compatible parental configurations have been chosen, its probabilities can be elicited using a sophisticated probability elicitation technique with a rather small overhead. In one way, the probability elicitation technique becomes feasible and, theoretically, the input of

Nonetheless, it is evident that some methods may benefit more from elaborated probability elicitation techniques than others. However, it is still possible to use these techniques even in a method such as RNM. For example, the expert can inform the probabilities rather than the mode of each probabilistic distribution of the combination of extreme states (see Table 1). We believe that studies must be carried out to check if combining elaborated probability elicitation techniques with semiautomatic method can indeed improve the con-

overhead. However, further studies are needed to confirm this.

Issues in the Probability Elicitation Process of Expert-Based Bayesian Networks

elicited from experts) generates the NPT.

DOI: http://dx.doi.org/10.5772/intechopen.81602

the semiautomatic method is improved.

struction of large-scale BN.

23

methods and reduce the garbage in/garbage out effect.

6. Conclusion

more reliable BN.

Similarly, to obtain the probabilities of a node X with a single parent Y we estimate P xsi ; <sup>j</sup>ysj ð Þ . In Figure 10 we see the resulting matrix when node <sup>Y</sup> <sup>¼</sup> <sup>y</sup>sj

: In the above matrix apqð Þ p ¼ 1; 2; …; n; q ¼ 1; 2; …; n is specified by questions such as "if the node Y is in the ysj state, comparing the states xsi and xsj of the child node X, which one is more likely and how more likely? ". After obtaining <sup>ω</sup>ijð Þ <sup>i</sup> <sup>¼</sup> <sup>1</sup>; …; <sup>n</sup> we have P X <sup>¼</sup> xsi <sup>j</sup><sup>Y</sup> <sup>¼</sup> <sup>y</sup>sj <sup>ð</sup> Þ ¼ <sup>ω</sup>ij. The number of matrices needed to obtain ωijð Þ i ¼ 1; 2; …; n; j ¼ 1; 2; …; m is equal to the number of states of Y. The obtained results compose the NPT of a child node X conditioned to the states of a parent Y, as shown in Figure 11.

The approach to generate the conditional probabilities for multi-parent nodes is based on [32], which states that when a node A in a Bayesian network has two parents B and C, its conditional probability in B and C can be approximated by P Að Þ¼ jB, C ∝ P Að Þ jB P Að Þ jC where ∝ is a normalizing factor that ensures that ∝ ∑<sup>a</sup><sup>∈</sup> <sup>A</sup>P að Þ¼ jB, C 1. Hence, to generate the complete NPT Eq. (10) is applied:

$$P(X = \mathbf{x}^{\ell\_i} \mid Y\_1 = \mathbf{y}\_1^{\ell\_i}, Y\_2 = \mathbf{y}\_2^{\ell\_i}, \dots, Y\_k = \mathbf{y}\_k^{\ell\_i}) = \infty \prod\_{j=1}^k P\left(X = \mathbf{x}^{\ell\_i} \mid Y\_j = \mathbf{y}\_j^{\ell\_i}\right) \tag{10}$$

This approach focuses on easing the burden for experts by automatically generating probabilistic distributions of nodes with multiple parents, and consequently, the complete NPT through the calculation of the product of the probabilities conditioned on single parents. Thus, the expert assesses the probabilities of a particular child node conditioned to each of its parents, one at a time, and these probabilities are combined to get the node's conditional probability conditional on all its parents.


Figure 10.

Comparison matrix of a node X conditioned on a single parent Y in the state ysj .


#### Figure 11.

Resulting NPT for a single parent node.

Issues in the Probability Elicitation Process of Expert-Based Bayesian Networks DOI: http://dx.doi.org/10.5772/intechopen.81602

In [31] a similar method is proposed, also based on the AHP, which allow the quantitative evaluation of the inconsistency of experts in the task of probability assessment. The difference of the proposed methods is that in [11] the magnitude problem to construct NPT is reduced with a semiautomatic approach for the generation of the NPT and the cognitive effort is reduced because the experts only need to evaluate, indirectly, probabilistic distributions conditioned on a single parent at a time, whereas in [31] the effort is even greater than the direct elicitation of probabilities. Nonetheless, it is our understanding that the method proposed in [31] can somewhat extend other methods such as the WSA, without causing too much overhead. However, further studies are needed to confirm this.

### 6. Conclusion

all elements in ω is 1 and the ith element ω<sup>i</sup> represents the relative importance of

Similarly, to obtain the probabilities of a node X with a single parent Y we

In the above matrix apqð Þ p ¼ 1; 2; …; n; q ¼ 1; 2; …; n is specified by questions such as "if the node Y is in the ysj state, comparing the states xsi and xsj of the child

node X, which one is more likely and how more likely? ". After obtaining

obtain ωijð Þ i ¼ 1; 2; …; n; j ¼ 1; 2; …; m is equal to the number of states of Y. The obtained results compose the NPT of a child node X conditioned to the states of a

based on [32], which states that when a node A in a Bayesian network has two parents B and C, its conditional probability in B and C can be approximated by P Að Þ¼ jB, C ∝ P Að Þ jB P Að Þ jC where ∝ is a normalizing factor that ensures that ∝ ∑<sup>a</sup><sup>∈</sup> <sup>A</sup>P að Þ¼ jB, C 1. Hence, to generate the complete NPT Eq. (10) is applied:

The approach to generate the conditional probabilities for multi-parent nodes is

si <sup>k</sup> Þ ¼ ∝

This approach focuses on easing the burden for experts by automatically generating probabilistic distributions of nodes with multiple parents, and consequently, the complete NPT through the calculation of the product of the probabilities conditioned on single parents. Thus, the expert assesses the probabilities of a particular child node conditioned to each of its parents, one at a time, and these probabilities are combined to get the node's conditional probability condi-

Y k

P X <sup>¼</sup> xsi

.

jYj ¼ y si j

� �

j¼1

, ω<sup>i</sup> is now interpreted as the prior probability of the state xsi

; <sup>j</sup>ysj ð Þ . In Figure 10 we see the resulting matrix when node <sup>Y</sup> <sup>¼</sup> <sup>y</sup>sj

<sup>j</sup><sup>Y</sup> <sup>¼</sup> <sup>y</sup>sj <sup>ð</sup> Þ ¼ <sup>ω</sup>ij. The number of matrices needed to

, that is,

:

(10)

the state xsi

Enhanced Expert Systems

P xsi ð Þ¼ <sup>ω</sup>i.

estimate P xsi

<sup>ω</sup>ijð Þ <sup>i</sup> <sup>¼</sup> <sup>1</sup>; …; <sup>n</sup> we have P X <sup>¼</sup> xsi

parent Y, as shown in Figure 11.

P X <sup>¼</sup> <sup>x</sup>si <sup>ð</sup> <sup>j</sup>Y<sup>1</sup> <sup>¼</sup> <sup>y</sup>

tional on all its parents.

Figure 10.

Figure 11.

22

Resulting NPT for a single parent node.

si <sup>1</sup> , Y<sup>2</sup> ¼ y

si

Comparison matrix of a node X conditioned on a single parent Y in the state ysj

<sup>2</sup> , …, Yk ¼ y

Despite recent popularity, the construction of BN is still a challenging task. One of the main obstacles refers to defining the NPT for large-scale BN. It is possible to automate this process using batch learning, but it requires a database with enough information. In practice, this is not common. The other option is to elicit data from experts, which is unfeasible in most cases due to the number of probabilities required. A third option is to use semiautomatic methods that given an input (i.e., elicited from experts) generates the NPT.

In this chapter, we present three semiautomatic methods, found in an exploratory study through a literature review. These methods help, to a certain extent, to minimize the effects of human biases by reducing the parameters that are required to construct complete NPT. However, these methods are highly reliable on the input data elicited from experts. Therefore, flawed input necessarily produces nonsense output. For this reason, we present one of many probability elicitation techniques as an example, which can improve the input data needed by the semiautomatic methods and reduce the garbage in/garbage out effect.

The biggest problem with elaborated probability elicitation techniques is undoubtedly its cost, which is often greater than the direct elicitation of probabilities. Thus, these methods are not well suited for the construction of large-scale BN, despite been useful to deal with well know biases. However, it is our understanding that the cost to use elaborated probability elicitation techniques is drastically reduced when only is needed to elicit a small fraction of data of what would be necessary for manual definition of NPT. Therefore, the combination of semiautomatic methods and elaborated probability elicitation techniques might help building more reliable BN.

For example, let us consider the WSA method that uses a partial elicited NPT to generate a complete one using the concept of compatible parental configurations, weights of the parents and a weighted sum algorithm. Once the compatible parental configurations have been chosen, its probabilities can be elicited using a sophisticated probability elicitation technique with a rather small overhead. In one way, the probability elicitation technique becomes feasible and, theoretically, the input of the semiautomatic method is improved.

Nonetheless, it is evident that some methods may benefit more from elaborated probability elicitation techniques than others. However, it is still possible to use these techniques even in a method such as RNM. For example, the expert can inform the probabilities rather than the mode of each probabilistic distribution of the combination of extreme states (see Table 1). We believe that studies must be carried out to check if combining elaborated probability elicitation techniques with semiautomatic method can indeed improve the construction of large-scale BN.

Enhanced Expert Systems
