Abstract

A major challenge in constructing a Bayesian network (BN) is defining the node probability tables (NPT), which can be learned from data or elicited from domain experts. In practice, it is common not to have enough data for learning, and elicitation from experts is the only option. However, the complexity of defining NPT grows exponentially, making their elicitation process costly and error-prone. In this research, we conducted an exploratory study through a literature review that identified the main issues related to the task of probability elicitation and solutions to construct large-scale NPT while reducing the exposure to these issues. In this chapter, we present in detail three semiautomatic methods that reduce the burden for experts. We discuss the benefits and drawbacks of these methods, and present directions on how to improve them.

Keywords: Bayesian networks, probability elicitation, node probability table, expert systems, artificial intelligence

### 1. Introduction

Bayesian network (BN) is a mathematical model that graphically and numerically represents the probabilistic relationships between random variables through the Bayes theorem. This technique is becoming popular to aid in decision-making in several domains due to the evolution of the computational capacity that makes possible the calculation of complex BN [1]. Some examples of BN application areas are: software development project management [2, 3]; large-scale engineering projects [4]; and the prediction of success in innovation projects [5].

On the other hand, there are open challenges related to the construction of BN. One of these challenges is to build the node probability tables (NPT). In cases where there are databases with enough information for the problem in question, it is possible to automate the process of constructing NPT through batch learning [6]. Unfortunately, in practice, in most cases, there is not enough data. That is, it is necessary to collect expert data and manually define the NPT [1].

Furthermore, experts can often understand and identify key relationships that data alone may fail to discover [7]. Therefore, the concept of smart data is defined by [7]: a method that supports data engineering and knowledge engineering

**6**

*Enhanced Expert Systems*

[1] https://torry.aberdeen.sch.uk

[3] http://www.test-me.co.uk

Science. 2003;**29**(6):515-566

[2] http://www.duncanrig.s-lanark.sch.uk

[4] Kalogirou SA. Artificial intelligence for the modeling and control of combustion processes: A review. Progress in Energy and Combustion

**References**

approaches with emphasis on applying causal knowledge and real-world facts to develop models.

discuss extensions to these methods. It is our understanding that these methods can yet benefit from elaborate probability elicitation techniques. Such techniques can add additional overhead when manually defining the NPT, but this overhead is hugely reduced with semiautomatic methods (i.e., given the

Issues in the Probability Elicitation Process of Expert-Based Bayesian Networks

reduced number of questions to ask the experts) making them a viable choice to

This chapter is organized as follows. Section 2 presents an introduction to BN. Section 3 presents common heuristics which should be acknowledged and considered during the probability elicitation process. Section 4 presents a probability elicitation technique which can extend some of the semiautomatic methods. Section 5 presents three semiautomatic methods to generate NPT. Section 6 presents our

Bayesian networks are graph models used to represent knowledge about an uncertain domain [12]. The Bayesian network, B, is a directed acyclic graph that represents a joint probability distribution over a set of random variables V [13]. The network is defined by the pair B ¼ f g G; θ , where G ¼ ð Þ V; E is a directed acyclic graph with nodes V representing random variables and edges E representing the direct dependencies between these variables. θ is the set of probability functions (i.e., node probability table) which contains the parameter θvi∣π<sup>i</sup> ¼ PBðvijπiÞ for each vi in Vi conditioned by πi, the set of parameters of Vi in G. Eq. (1) portraits the joint probability distribution defined by B over V. An example of a BN is depicted in

> Yn i¼1

In the above example, the probability of a person having cancer is calculated according to two variables: "Relatives had cancer" (Y1) and "Smoke" (Y2). The ellipses represent the nodes and the arrows represent the arcs. Even though the arcs represent the causal connection's direction between the variables, information can propagate in any direction [14]. Hence, the direction of the arrows indicates the dependency to define the probability functions. In this example, it is assumed that all the variables are Booleans. Since the node "Cancer" is pointed out by Y<sup>1</sup> and Y2, the probability function is composed of probabilities for all possible combinations of

PB Vi ð Þ¼ jπ<sup>i</sup>

Yn i¼1 θVi

∣π<sup>i</sup> (1)

PBð Þ¼ V1; …;Vn

improve the method's input.

DOI: http://dx.doi.org/10.5772/intechopen.81602

conclusions and future works.

2. Background

Figure 1.

states of Y<sup>1</sup> and Y2.

Figure 1. BN example.

9

In this context, it is necessary to manually elicit data from experts to define the NPT. However, given that the complexity of defining NPT increases exponentially, for large-scale BN, it becomes impracticable to manually define all the probability functions that compose each NPT [1]. In addition, experts often have time constraints and are rarely interested in manually defining NPT, partially because it is necessary to work with many probabilistic distributions for long periods [8].

In addition, other factors may compromise the process of probability elicitation to construct the NPT, such as commonly used heuristics. Some well know heuristics used to reduce the cognitive effort in probability assessment task may lead the expert towards biased judgment of probability, leading to systematic errors. Moreover, the experts are hardly able to keep mutually consistent distributions during the NPT definition [1]. In addition, factors such as boredom and fatigue are enough to make the criteria deviate during probability assessment [8], when in fact, it should be uniformly applied throughout the whole elicitation process.

A solution to solve this problem has been proposed by [1], which will be referenced herein as the ranked nodes method (RNM). Its goal is to define the NPT of the parent nodes and then generate the NPT of the child nodes. Ref. [1] introduces the concept of ranked nodes, ordinal random variables represented on a monotonically ordered continuous scale. A fundamental feature of this method is that mathematical expressions generate the child node's NPT. These expressions define the central tendency of the child node for each combination of states of the parent nodes and have as input a set of weights of the parent nodes, which quantifies the relative strengths of their influence on the child node, and a variance parameter.

Another approach was proposed by [8], which will be referenced here as the weighted sum algorithm (WSA). This method uses well know heuristics in its favor, more precisely, the availability [9] heuristic and the simulation [10] heuristic. The main focus of this method is to assemble part of the NPT from experts by asking questions that comprehend cases that are easy to recall by experts, which is likely to be associated to more realistic probabilities. In the WSA, the remainder of the NPT is generated using interpolation techniques.

A systematic approach to generate NPT of nodes with multiple parents is proposed in [11]. This approach is an adaptation of the analytic hierarchy process (AHP) method for the task of probability elicitation and semiautomatic generation of NPT, in which the expert needs only to make the assessment of probabilities conditioned on single parents. In this approach, the probability assessment is indirect by means of paired state judgments and the NPT is generated through the calculation of the product of the probabilities of the child node conditioned on single parents.

The three methods stated above reduce the burden for experts and allow the construction of complex BN in which manual elicitation of the NPT is unfeasible and, generally, there is not enough data to use batch learning. The reduced number of parameters to generate the NPT and consequently, reduced number of questions to ask the experts, makes it easier for the facilitator (e.g., BN expert) to deal with heuristics and possible biases during the NPT construction process. These methods can yet be extended with elaborate probability elicitation techniques (i.e., to improve its input).

Therefore, the objective of this research is to assess in detail three semiautomatic methods to generate NPT. We identified these methods in an exploratory study through a literature review. Additionally, we present heuristics that must be acknowledged during probability assessment for NPT construction and

Issues in the Probability Elicitation Process of Expert-Based Bayesian Networks DOI: http://dx.doi.org/10.5772/intechopen.81602

discuss extensions to these methods. It is our understanding that these methods can yet benefit from elaborate probability elicitation techniques. Such techniques can add additional overhead when manually defining the NPT, but this overhead is hugely reduced with semiautomatic methods (i.e., given the reduced number of questions to ask the experts) making them a viable choice to improve the method's input.

This chapter is organized as follows. Section 2 presents an introduction to BN. Section 3 presents common heuristics which should be acknowledged and considered during the probability elicitation process. Section 4 presents a probability elicitation technique which can extend some of the semiautomatic methods. Section 5 presents three semiautomatic methods to generate NPT. Section 6 presents our conclusions and future works.

#### 2. Background

approaches with emphasis on applying causal knowledge and real-world facts to

In this context, it is necessary to manually elicit data from experts to define the NPT. However, given that the complexity of defining NPT increases exponentially, for large-scale BN, it becomes impracticable to manually define all the probability functions that compose each NPT [1]. In addition, experts often have time constraints and are rarely interested in manually defining NPT, partially because it is necessary to work with many probabilistic distributions for long periods [8].

In addition, other factors may compromise the process of probability elicitation to construct the NPT, such as commonly used heuristics. Some well know heuristics used to reduce the cognitive effort in probability assessment task may lead the expert towards biased judgment of probability, leading to systematic errors. Moreover, the experts are hardly able to keep mutually consistent distributions during the NPT definition [1]. In addition, factors such as boredom and fatigue are enough to make the criteria deviate during probability assessment [8], when in fact, it

should be uniformly applied throughout the whole elicitation process.

A solution to solve this problem has been proposed by [1], which will be referenced herein as the ranked nodes method (RNM). Its goal is to define the NPT of the parent nodes and then generate the NPT of the child nodes. Ref. [1] introduces the concept of ranked nodes, ordinal random variables represented on a monotonically ordered continuous scale. A fundamental feature of this method is that mathematical expressions generate the child node's NPT. These expressions define the central tendency of the child node for each combination of states of the parent nodes and have as input a set of weights of the parent nodes, which quantifies the relative strengths of their influence on the child node, and a variance

Another approach was proposed by [8], which will be referenced here as the weighted sum algorithm (WSA). This method uses well know heuristics in its favor, more precisely, the availability [9] heuristic and the simulation [10] heuristic. The main focus of this method is to assemble part of the NPT from experts by asking questions that comprehend cases that are easy to recall by experts, which is likely to be associated to more realistic probabilities. In the WSA, the remainder of the NPT

A systematic approach to generate NPT of nodes with multiple parents is proposed in [11]. This approach is an adaptation of the analytic hierarchy process (AHP) method for the task of probability elicitation and semiautomatic generation of NPT, in which the expert needs only to make the assessment of probabilities conditioned on single parents. In this approach, the probability assessment is indirect by means of paired state judgments and the NPT is generated through the calculation of the product of the probabilities of the child node conditioned on

The three methods stated above reduce the burden for experts and allow the construction of complex BN in which manual elicitation of the NPT is unfeasible and, generally, there is not enough data to use batch learning. The reduced number of parameters to generate the NPT and consequently, reduced number of questions to ask the experts, makes it easier for the facilitator (e.g., BN expert) to deal with heuristics and possible biases during the NPT construction process. These methods can yet be extended with elaborate probability elicitation techniques (i.e., to

Therefore, the objective of this research is to assess in detail three semiautomatic methods to generate NPT. We identified these methods in an exploratory study through a literature review. Additionally, we present heuristics that must be acknowledged during probability assessment for NPT construction and

develop models.

Enhanced Expert Systems

parameter.

single parents.

improve its input).

8

is generated using interpolation techniques.

Bayesian networks are graph models used to represent knowledge about an uncertain domain [12]. The Bayesian network, B, is a directed acyclic graph that represents a joint probability distribution over a set of random variables V [13]. The network is defined by the pair B ¼ f g G; θ , where G ¼ ð Þ V; E is a directed acyclic graph with nodes V representing random variables and edges E representing the direct dependencies between these variables. θ is the set of probability functions (i.e., node probability table) which contains the parameter θvi∣π<sup>i</sup> ¼ PBðvijπiÞ for each vi in Vi conditioned by πi, the set of parameters of Vi in G. Eq. (1) portraits the joint probability distribution defined by B over V. An example of a BN is depicted in Figure 1.

$$P\_B(V\_1, \ldots, V\_n) = \prod\_{i=1}^n P\_B(V\_i | \pi\_i) = \prod\_{i=1}^n \theta\_{V\_i} | \pi\_i \tag{1}$$

In the above example, the probability of a person having cancer is calculated according to two variables: "Relatives had cancer" (Y1) and "Smoke" (Y2). The ellipses represent the nodes and the arrows represent the arcs. Even though the arcs represent the causal connection's direction between the variables, information can propagate in any direction [14]. Hence, the direction of the arrows indicates the dependency to define the probability functions. In this example, it is assumed that all the variables are Booleans. Since the node "Cancer" is pointed out by Y<sup>1</sup> and Y2, the probability function is composed of probabilities for all possible combinations of states of Y<sup>1</sup> and Y2.

Figure 1. BN example.

#### 2.1 NPT's complexity

A challenge in constructing a BN is defining the NPT, which can be learned from data or elicited from domain experts. In practice, it is common not to have enough data for learning and elicitation from experts is the only option. However, the complexity of defining NPT grows exponentially, which makes the elicitation process costly and error-prone.

the participants in the tournament) and aleatory uncertainties (e.g., possibility of a team losing a player) play an important role in probability assessment. Nonetheless, if asked, one is capable of making an evaluation and give a quick answer. How do

According to [16], people make use of a limited number of heuristics, mental shortcuts, to reduce the complexity of judging the probability of an uncertain event. These mental shortcuts reduce the cognitive effort required to judge the probability of such events. However, they can lead to biases that result in systematic errors. In [16], three commonly used heuristics are presented: representativeness; availability;

The representative heuristic [16] describes the process by which people use the similarity of two events to estimate the degree to which one event is representative of another. It is used to answer questions such as: What is the probability that an event A originates from a process B? What is the probability of a process B generating event A? That is, if A is highly representative of B, the probability of A generating B is considered high. Conversely, if A is not representative of B, the

Consider the following example adapted from [16]: "Steve is very shy and withdrawn, has little interest in people, or in the real world. He has need for order and organization, and a passion for details". Based on this description, what is Steven's most likely profession? Farmer or Librarian? You probably thought of a librarian. That happens because the probability of Steve's profession be a librarian is evaluated by the degree to which he is representative, or similar to, the stereotype of a librarian. However, several other factors that should have a significant effect on probability, like the prior probability, or base-rate frequency of the outcomes have no effect on representativeness. For example, the fact that there are many more farmers than librarians should be considered in this case, but it is neglected. The availability heuristic [9, 16] is related to the judgment of probability of events occurring based on the ease with which we retrieve instances of these events in our mind. For example, to evaluate the likelihood that a person under the age of 30 years will suffer a heart attack, people usually do a quick search in their memory for cases they know of young people who have suffered a heart attack. This heuristic is useful because instances of larger classes are easier to remember than instances of smaller classes. However, the availability is affected by factors other than the frequency of events or probability. One may overestimate the probability of a young person getting cancer based on how recent an instance of such an event has

Anchoring and adjustment heuristic [16] occur when people judge probabilities based on an initial value, which is adjusted until the final response is reached. The problem with this heuristic is that the adjustments are usually insufficient. In other words, the expert assessment is likely to fluctuate around the initial anchor provided. It is important noting that, an anchor may be embedded in the formulation of a question to the domain expert (i.e., when a starting point is given), but it can also

In short, heuristics are mental shortcuts that reduce the cognitive effort in the task of reasoning about the probability of events with uncertainty. Although useful, it has its disadvantages that must be considered in the knowledge elicitation process. Therefore, it is imperative to acknowledge the possible biases derived from heuristics during the process of probability assessment, explicitly informing the experts of their existence and adopting appropriate methods to reduce their effects. The number of probabilities to be elicited to construct an NPT may inevitably fall under some bias considering the effort needed from the experts. The semiautomatic methods reduce the number of questions to be asked to the expert or entirely

people manage to judge the probability of highly uncertain events?

Issues in the Probability Elicitation Process of Expert-Based Bayesian Networks

and anchoring.

probability of A originating from B is low.

DOI: http://dx.doi.org/10.5772/intechopen.81602

occurred in his life, for example.

11

be the result of an incomplete computation.

Let us consider the following example shown in Figure 2. In this BN, we want to assess Teamwork efficiency of a group of people that works collectively to achieve certain goals. Teamwork is directly influenced by Autonomy (i.e., self-management ability and shared leadership); Cohesion (i.e., the capacity of being in close agreement and work well together); and Collaboration (i.e., the ability to communicate and coordinate). This example will be used throughout this chapter.

To elicit all the probabilities needed to construct the NPT of the child node Teamwork, a facilitator (e.g., BN expert) has to ask 53 questions to the expert, a question for each P vi ð Þ jπ<sup>i</sup> . As we can see, the complexity of performing this task grows exponentially as the number of parents increases, making it quite expensive and error-prone.

Methods to address this problem were proposed. Noisy-OR and Noisy-MAX are two popular ones. However, the disadvantage of Noisy-OR is that it only applies to Boolean nodes. According to [1], the disadvantage of Noisy-MAX is that it does not model the extent of relationships required for large-scale BN. In this chapter, we present methods found in the literature that are applicable to a larger range of BN.

### 3. Heuristics in probability

The quantification process of a BN consists in converting expert knowledge, acquired through personal experiences, into probabilistic knowledge by eliciting a large number of subjective probabilities that reflect the expert's belief at a given moment about something. Probability assessment can be described as the task of quantifying the chances of an event occur, using percentages. However, as the degree of complexity increases, it becomes increasingly difficult to size the probability of occurrence of each of the possible events in a given scenario.

For instance, we may have a hunch as to who will be the winner of a particular tournament at a particular time, but we will never know for sure the exact probability since the number of factors that can influence the event goes beyond our reach. Apart from that, epistemic uncertainties (e.g., lack of knowledge about all

#### Figure 2.

BN example adapted from [15] where a child node Teamwork is influenced by three parent nodes: Autonomy (Y1), Cohesion (Y2), and Collaboration (Y3). Each node has five ordinal states: very low (VL), low (L), median (M), high (H), very high (VH).

#### Issues in the Probability Elicitation Process of Expert-Based Bayesian Networks DOI: http://dx.doi.org/10.5772/intechopen.81602

the participants in the tournament) and aleatory uncertainties (e.g., possibility of a team losing a player) play an important role in probability assessment. Nonetheless, if asked, one is capable of making an evaluation and give a quick answer. How do people manage to judge the probability of highly uncertain events?

According to [16], people make use of a limited number of heuristics, mental shortcuts, to reduce the complexity of judging the probability of an uncertain event. These mental shortcuts reduce the cognitive effort required to judge the probability of such events. However, they can lead to biases that result in systematic errors. In [16], three commonly used heuristics are presented: representativeness; availability; and anchoring.

The representative heuristic [16] describes the process by which people use the similarity of two events to estimate the degree to which one event is representative of another. It is used to answer questions such as: What is the probability that an event A originates from a process B? What is the probability of a process B generating event A? That is, if A is highly representative of B, the probability of A generating B is considered high. Conversely, if A is not representative of B, the probability of A originating from B is low.

Consider the following example adapted from [16]: "Steve is very shy and withdrawn, has little interest in people, or in the real world. He has need for order and organization, and a passion for details". Based on this description, what is Steven's most likely profession? Farmer or Librarian? You probably thought of a librarian. That happens because the probability of Steve's profession be a librarian is evaluated by the degree to which he is representative, or similar to, the stereotype of a librarian. However, several other factors that should have a significant effect on probability, like the prior probability, or base-rate frequency of the outcomes have no effect on representativeness. For example, the fact that there are many more farmers than librarians should be considered in this case, but it is neglected.

The availability heuristic [9, 16] is related to the judgment of probability of events occurring based on the ease with which we retrieve instances of these events in our mind. For example, to evaluate the likelihood that a person under the age of 30 years will suffer a heart attack, people usually do a quick search in their memory for cases they know of young people who have suffered a heart attack. This heuristic is useful because instances of larger classes are easier to remember than instances of smaller classes. However, the availability is affected by factors other than the frequency of events or probability. One may overestimate the probability of a young person getting cancer based on how recent an instance of such an event has occurred in his life, for example.

Anchoring and adjustment heuristic [16] occur when people judge probabilities based on an initial value, which is adjusted until the final response is reached. The problem with this heuristic is that the adjustments are usually insufficient. In other words, the expert assessment is likely to fluctuate around the initial anchor provided. It is important noting that, an anchor may be embedded in the formulation of a question to the domain expert (i.e., when a starting point is given), but it can also be the result of an incomplete computation.

In short, heuristics are mental shortcuts that reduce the cognitive effort in the task of reasoning about the probability of events with uncertainty. Although useful, it has its disadvantages that must be considered in the knowledge elicitation process. Therefore, it is imperative to acknowledge the possible biases derived from heuristics during the process of probability assessment, explicitly informing the experts of their existence and adopting appropriate methods to reduce their effects.

The number of probabilities to be elicited to construct an NPT may inevitably fall under some bias considering the effort needed from the experts. The semiautomatic methods reduce the number of questions to be asked to the expert or entirely

2.1 NPT's complexity

Enhanced Expert Systems

cess costly and error-prone.

and error-prone.

Figure 2.

10

median (M), high (H), very high (VH).

3. Heuristics in probability

A challenge in constructing a BN is defining the NPT, which can be learned from data or elicited from domain experts. In practice, it is common not to have enough data for learning and elicitation from experts is the only option. However, the complexity of defining NPT grows exponentially, which makes the elicitation pro-

Let us consider the following example shown in Figure 2. In this BN, we want to assess Teamwork efficiency of a group of people that works collectively to achieve certain goals. Teamwork is directly influenced by Autonomy (i.e., self-management ability and shared leadership); Cohesion (i.e., the capacity of being in close agreement and work well together); and Collaboration (i.e., the ability to communicate

To elicit all the probabilities needed to construct the NPT of the child node Teamwork, a facilitator (e.g., BN expert) has to ask 53 questions to the expert, a question for each P vi ð Þ jπ<sup>i</sup> . As we can see, the complexity of performing this task grows exponentially as the number of parents increases, making it quite expensive

Methods to address this problem were proposed. Noisy-OR and Noisy-MAX are two popular ones. However, the disadvantage of Noisy-OR is that it only applies to Boolean nodes. According to [1], the disadvantage of Noisy-MAX is that it does not model the extent of relationships required for large-scale BN. In this chapter, we present methods found in the literature that are applicable to a larger range of BN.

The quantification process of a BN consists in converting expert knowledge, acquired through personal experiences, into probabilistic knowledge by eliciting a large number of subjective probabilities that reflect the expert's belief at a given moment about something. Probability assessment can be described as the task of quantifying the chances of an event occur, using percentages. However, as the degree of complexity increases, it becomes increasingly difficult to size the proba-

For instance, we may have a hunch as to who will be the winner of a particular tournament at a particular time, but we will never know for sure the exact probability since the number of factors that can influence the event goes beyond our reach. Apart from that, epistemic uncertainties (e.g., lack of knowledge about all

BN example adapted from [15] where a child node Teamwork is influenced by three parent nodes: Autonomy (Y1), Cohesion (Y2), and Collaboration (Y3). Each node has five ordinal states: very low (VL), low (L),

bility of occurrence of each of the possible events in a given scenario.

and coordinate). This example will be used throughout this chapter.

removes the need of direct evaluation of probabilities during the construction of the NPT, which makes it easier for the facilitator and the expert to deal with these heuristics during the elicitation process, seizing the benefits of the heuristics and reducing their possible negative effects.
