**A Bayesian Approach for Calibrating Risk Assessment Models**

Michael S. Williams1, Eric D. Ebel1, Jennifer A. Hoeting2 and James L. Withee3 *1Food Safety and Inspection Service, United States Department of Agriculture, Fort Collins, Colorado 2Department of Statistics, Colorado State University, Fort Collins, Colorado 3GigaYeast Inc., Belmont, California USA* 

#### **1. Introduction**

296 Novel Approaches and Their Applications in Risk Assessment

Mangalmurti, S. Murtagh, L. Mello, M. (2010) Medical Malpractice Liability in the Age of

McDermott RE, Mikulak RJ, Beauregard MR. *The Basics of FMEA*. Portland, Oregon, USA

Radicki, R. Sittig, D. (2011) Application of Electronic Health Records to the Joint

Saba V, McCormick KA. *Essentials of Nursing Informatics,* 4th ed. McGraw-Hill Companies,

United States Federal Register. (2011) Medicaid Program; Payment Adjustment for

Weiner, J. Kfuri, T. Chan, K. Fowles, J. (2007). "e-Iatrogenesis": The Most Critical

Wenzel G. (2002) Creating an Interactive Interdisciplinary Electronic Assessment. *CIN, Computers, Informatics, Nursing*, Vol 20, No 6, 251-260. ISBN/ISSN: 15382931

Mosby's Dental Dictionary, 2nd edition. © 2008 Elsevier, Inc. All rights reserved.

*Informatics Association,* Vol. 14, No. 3, (June 2007), pp. 387-388.

No. 108, (Monday, June 6, 2011), pp. 32816-32838.

(November 18, 2010), pp. 2060-2067.

Productivity, Inc; 1996

New York, NY, 2005

pp.92-93.

Electronic Health Records. *The New England Journal of Medicine,* Vol. 363, No. 21,

Commission's 2011 Patient Safety Goals. *JAMA,* Vol. 306, No. 1, (January 2011),

Provider-Preventable Conditions Including Health Care-Acquired Conditions. *Department of Health and Human Services, Centers for Medicare and Medicaid,* Vol. 176,

Unintended Consequence of CPOE and other HIT. *Journal of the American Medical* 

Monte Carlo simulation is a commonly used tool for constructing foodborne pathogen risk assessment models. Monte Carlo simulation enables an analyst to construct a probabilistic model of almost any desired complexity. It requires relatively little mathematical rigor and the models can be presented in an intuitive manner. It has some drawbacks, however. For example, Monte Carlo simulation requires that each parameter, as well as its uncertainty, be quantitatively described.

The models are typically used to make a projection of possible outcomes. In food-safety risk assessment applications, we typically construct a model to predict the number of human illnesses in the population. These calculations are based on the prevalence of contaminated production units and their microbial load. These are tracked through food production, consumer handing and consumption. The final step is converting predicted contamination into a human health impact via a dose-response model.

Foodborne illness is often the result of an acute microbial pathogen exposure. More than 75 countries have implemented surveillance systems to monitor occurrences of these illnesses (Allos et al., 2004; de Jong B & K., 2006; Herikstad et al., 2002). These surveillance systems do not capture every case of foodborne illness, so scaling factors are developed to estimate the total number of illnesses for the pathogen of interest (Ebel et al., 2012; Scallan et al., 2011). Additional scaling factors can be developed to extend these estimates to a specific product-pathogen pair (Hald et al., 2004).

A conundrum for risk assessors occurs when the illness estimates from a Monte Carlo-based risk assessment model do not match the estimates based on the surveillance data. When these two estimates do not match the risk assessment model must be calibrated. A common approach to calibrating the model is to adjust the parameters of the dose-response function to match predicted to observed illnesses. Alternative calibration approaches are to replace components of the model, such as changing the models used to calculate pathogen attenuation during cooking. The concern with these calibration approaches is their lack of objectivity and rigor.

Other factors may be needed to relate the number of observed illnesses to the total number of illnesses. For example, the surveillance system may only cover a fraction of the population or a pathogen may be specific to a single product (e.g., BSE cases are associated with beef consumption so *α* = 1). Thus, the inclusion of the adjustment terms will be specific to each surveillance system and product-pathogen pair. For this reason, the adjustment factor(s) will not be included for the remainder of the general model development and we note that *NservingsP*(*ill*) is the rate parameter describing the illness rate in the population, denoted *λill*. An extensive development of these scaling factors can be found in Williams & Ebel (2012) as

A Bayesian Approach for Calibrating Risk Assessment Models 299

In many applications, the objective of the risk assessment is to predict the change in the number of human illnesses that would occur if the production process where improved. This improvement is expected to reduce *P*(*ill*) and the resulting reduction in illnesses can

The *P*(*ill*) term is Equation 1 is one of the typical outputs of a Monte Carlo risk assessment model. Efforts to reduce the complexity of a risk assessment model begin with expanding this term and looking for biologically plausible situations where a simpler model is appropriate. Model simplifications of quantitative microbial risk assessments often begin from first principles: microbial contamination begets food exposure begets illness. In this approach, the interest is in determining the unconditional probability of illness among all servings.

It is also possible to derive these simplifications by applying Bayes Formula. In this case, the question to be answered is "what is the probability that an illness occurred given exposure to a contaminated food". Using Bayesian language, the answer to this question is termed a posterior distribution or *P*(*ill*|*exp*). This is a conditional probability statement. From Bayes

*<sup>P</sup>*(*ill*|*exp*) = *<sup>P</sup>*(*exp*|*ill*)*P*(*ill*)

That the probability of exposure must be this sum can be appreciated from a simple Venn diagram in Figure 1 (i.e., the fraction of exposure servings includes those with and without

This diagram also illustrates the triviality of the conditional probability *P*(*exp*|*ill*). Because all illnesses result from exposure to a contaminated serving, the conditional probability of exposure given that a serving causes illnesses is unity. This conclusion generates a simpler

*<sup>P</sup>*(*ill*|*exp*) = *<sup>P</sup>*(*ill*)

*P*(*exp*)

calculation of the posterior probability that we are interested in, namely

*Iavoided* ∼ *Poisson*(*Nservings*(*P*(*ill*) − *Pnew*(*ill*))), (2)

*P*(*ill*|*exp*)*P*(*exp*) = *P*(*exp*|*ill*)*P*(*ill*) (3)

*<sup>P</sup>*(*exp*) , (4)

. (5)

well as Ebel et al. (2012).

**3. Partitioning** *P*(*ill*)

Theorem, we have

illness)

where *Pnew*(*ill*) is the reduced probability of illness.

where *<sup>P</sup>*(*exp*) = *<sup>P</sup>*(*exp*|*ill*)*P*(*ill*) + *<sup>P</sup>*(*exp*|*illC*)*P*(*illC*).

be modeled as

Bayesian methods offer an alternative approach to the problem of calibration. Various Bayesian methods have been used in previous risk assessment applications (Albert et al., 2008; Hald et al., 2004; Parsonsa et al., 2005). The Bayesian models are similar in their structure to a Monte Carlo model. They do, however, offer the advantage that any data available at a stochastic node can be incorporated into the model in the form of a prior distribution. These methods allow the user to incorporate the data for observed human illnesses. The model then produces a Bayesian revision of the system's parameter estimates. A consequence of conditioning inferences on the human illness data is that the parameter distributions for each node of the model are shifted so that the predictions more closely match the observed illness data. These adjusted distributions can be thought of as posterior distributions, though some Bayesian methods use the terms pre- and post-model distributions (Givens, 1993; Raftery et al., 1995). The direction and degree to which each of the prior distributions are shifted is to a large extent determined by the relative degrees of uncertainty in the prior distributions (i.e., the parameters in the model that are highly uncertain will experience the largest degree of adjustment in the process of calibrating the model).

This chapter will focus on introducing Bayesian methods for use in food-safety risk assessment. The use of Bayesian methods requires first establishing a simple probabilistic model. Once a model is established, a number of different Bayesian techniques can be used for drawing inferences. We will introduce a relatively simple resampling algorithm that can be used to calibrate a food-safety risk assessment model. A number of examples of the application of this framework can be found in the literature (Williams & Ebel, 2012; Williams, Ebel & Hoeting, 2011; Williams, Ebel & Vose, 2011a) so we will present a rather unusual example where the probabilistic model and Bayesian resampling method are used to study the laboratory test sensitivity of a test for *Escherichia coli* O157:H7 in ground beef samples.

#### **2. Probabilistic models for risk assessment**

The probabilistic model we consider assumes that interest lies in modeling a count of events during a specified time period. For food-safety applications, this count will usually be the number of illnesses observed (*Iobserved*) by a surveillance system during a single year.

We assume the count of sporadic illnesses detected by a surveillance system is reasonably modeled as Poisson random variable. Our model assumes that each food serving of the commodity of interest (*Nservings*) has a probability of causing illness of *P*(*ill*). The product *Nservings* × *P*(*ill*) describes the rate parameter for a Poisson distribution that describes the total number of illnesses for the product-pathogen pair of interest.

Two factors relate the total number of illnesses for a single product-pathogen pair to the number of illness observed by the surveillance system. The first factor describes the proportion of illnesses, *α*, attributed to the product of interest. This attribution factor modifies a rate parameter for a specific product to describe the illness rate for the product-pathogen pairing of interest. The second factor describes the proportion of illnesses *ρ* that are reported by the surveillance system. These factors modify the product-pathogen rate parameter to describe the number of observed illnesses whose etiology is the pathogen. This leads to the basic model for observed illnesses being

$$I\_{observed} \sim Poisson(\frac{\rho}{\alpha} \mathcal{N}\_{\text{servings}} P(ill)). \tag{1}$$

Other factors may be needed to relate the number of observed illnesses to the total number of illnesses. For example, the surveillance system may only cover a fraction of the population or a pathogen may be specific to a single product (e.g., BSE cases are associated with beef consumption so *α* = 1). Thus, the inclusion of the adjustment terms will be specific to each surveillance system and product-pathogen pair. For this reason, the adjustment factor(s) will not be included for the remainder of the general model development and we note that *NservingsP*(*ill*) is the rate parameter describing the illness rate in the population, denoted *λill*. An extensive development of these scaling factors can be found in Williams & Ebel (2012) as well as Ebel et al. (2012).

In many applications, the objective of the risk assessment is to predict the change in the number of human illnesses that would occur if the production process where improved. This improvement is expected to reduce *P*(*ill*) and the resulting reduction in illnesses can be modeled as

$$I\_{\text{avoided}} \sim \text{Poisson}(\text{N}\_{\text{servings}}(P(ill) - P\_{\text{new}}(ill))),\tag{2}$$

where *Pnew*(*ill*) is the reduced probability of illness.

#### **3. Partitioning** *P*(*ill*)

2 Will-be-set-by-IN-TECH

Bayesian methods offer an alternative approach to the problem of calibration. Various Bayesian methods have been used in previous risk assessment applications (Albert et al., 2008; Hald et al., 2004; Parsonsa et al., 2005). The Bayesian models are similar in their structure to a Monte Carlo model. They do, however, offer the advantage that any data available at a stochastic node can be incorporated into the model in the form of a prior distribution. These methods allow the user to incorporate the data for observed human illnesses. The model then produces a Bayesian revision of the system's parameter estimates. A consequence of conditioning inferences on the human illness data is that the parameter distributions for each node of the model are shifted so that the predictions more closely match the observed illness data. These adjusted distributions can be thought of as posterior distributions, though some Bayesian methods use the terms pre- and post-model distributions (Givens, 1993; Raftery et al., 1995). The direction and degree to which each of the prior distributions are shifted is to a large extent determined by the relative degrees of uncertainty in the prior distributions (i.e., the parameters in the model that are highly uncertain will experience the largest degree

This chapter will focus on introducing Bayesian methods for use in food-safety risk assessment. The use of Bayesian methods requires first establishing a simple probabilistic model. Once a model is established, a number of different Bayesian techniques can be used for drawing inferences. We will introduce a relatively simple resampling algorithm that can be used to calibrate a food-safety risk assessment model. A number of examples of the application of this framework can be found in the literature (Williams & Ebel, 2012; Williams, Ebel & Hoeting, 2011; Williams, Ebel & Vose, 2011a) so we will present a rather unusual example where the probabilistic model and Bayesian resampling method are used to study the laboratory test sensitivity of a test for *Escherichia coli* O157:H7 in ground beef samples.

The probabilistic model we consider assumes that interest lies in modeling a count of events during a specified time period. For food-safety applications, this count will usually be the

We assume the count of sporadic illnesses detected by a surveillance system is reasonably modeled as Poisson random variable. Our model assumes that each food serving of the commodity of interest (*Nservings*) has a probability of causing illness of *P*(*ill*). The product *Nservings* × *P*(*ill*) describes the rate parameter for a Poisson distribution that describes the

Two factors relate the total number of illnesses for a single product-pathogen pair to the number of illness observed by the surveillance system. The first factor describes the proportion of illnesses, *α*, attributed to the product of interest. This attribution factor modifies a rate parameter for a specific product to describe the illness rate for the product-pathogen pairing of interest. The second factor describes the proportion of illnesses *ρ* that are reported by the surveillance system. These factors modify the product-pathogen rate parameter to describe the number of observed illnesses whose etiology is the pathogen. This leads to the

> *ρ α*

*NservingsP*(*ill*)). (1)

number of illnesses observed (*Iobserved*) by a surveillance system during a single year.

total number of illnesses for the product-pathogen pair of interest.

*Iobserved* ∼ *Poisson*(

of adjustment in the process of calibrating the model).

**2. Probabilistic models for risk assessment**

basic model for observed illnesses being

The *P*(*ill*) term is Equation 1 is one of the typical outputs of a Monte Carlo risk assessment model. Efforts to reduce the complexity of a risk assessment model begin with expanding this term and looking for biologically plausible situations where a simpler model is appropriate. Model simplifications of quantitative microbial risk assessments often begin from first principles: microbial contamination begets food exposure begets illness. In this approach, the interest is in determining the unconditional probability of illness among all servings.

It is also possible to derive these simplifications by applying Bayes Formula. In this case, the question to be answered is "what is the probability that an illness occurred given exposure to a contaminated food". Using Bayesian language, the answer to this question is termed a posterior distribution or *P*(*ill*|*exp*). This is a conditional probability statement. From Bayes Theorem, we have

$$P(ill|\exp)P(\exp) = P(\exp|ill)P(ill) \tag{3}$$

$$P(ill|exp) = \frac{P(exp|ill)P(ill)}{P(exp)},\tag{4}$$

where *<sup>P</sup>*(*exp*) = *<sup>P</sup>*(*exp*|*ill*)*P*(*ill*) + *<sup>P</sup>*(*exp*|*illC*)*P*(*illC*).

That the probability of exposure must be this sum can be appreciated from a simple Venn diagram in Figure 1 (i.e., the fraction of exposure servings includes those with and without illness)

This diagram also illustrates the triviality of the conditional probability *P*(*exp*|*ill*). Because all illnesses result from exposure to a contaminated serving, the conditional probability of exposure given that a serving causes illnesses is unity. This conclusion generates a simpler calculation of the posterior probability that we are interested in, namely

$$P(ill|exp) = \frac{P(ill)}{P(exp)}.\tag{5}$$

**Histogram of ill|exp**

A Bayesian Approach for Calibrating Risk Assessment Models 301

Frequency

0

 500

 1000

 1500

 2000

 2500

 3000

 3500

P(ill|exp)

true parameter in nature), this distribution is necessarily informed by any covariance between these random variables. Furthermore, projections about future values of *P*(*ill*) derived by

A simple example may illustrate this point. Assume available prior evidence about numbers of illnesses (for a particular product-pathogen pair) is that 100, 200 or 300 cases occur per year with equal probability weights. Further, assume our understanding of exposures implies that 1%, 5% or 10% of 10,000 servings per year are contaminated with equal probability weights.

This ratio ranges from 0.01 (100 illnesses divided by 10,000 exposures) to 0.30 (300 illnesses divided by 1000 exposures). Furthermore, we can re-derive the uniform distribution of *P*(*ill*) by simply multiplying the vector of *P*(*ill*|*exp*) by the vector of *P*(*exp*). But, if we consider

Fig. 2. Histogram for a simplistic Monte Carlo calculation of *P*(*ill*|*exp*).

*Pnew*(*exp*)*P*(*ill*|*exp*) need to account for the starting value of *P*(*exp*).

A naive estimate of *P*(*ill*|*exp*) would look like the histogram in Figure 2.

0.00 0.05 0.10 0.15 0.20 0.25 0.30

Fig. 1. Venn diagram describing the probability of exposure *P*(*exp*) and the probability of illness for contaminated servings (*P*(*ill*|*exp*)).

A variation on this equation was employed by Bartholomew et al. (2005) to derive a linear risk model. In that example, the ratio of the estimated number of illnesses and number of exposures per annum derives a constant of proportionality that is ultimately used to project changes in illnesses from intentional changes in exposure. It should be noted that numbers of illnesses and exposures are simple transformations of *P*(*ill*) and *P*(*exp*) where each is multiplied by the number (or mass) of servings of a food consumed per year.

In microbial risk assessments, we usually have prior information about the number of illnesses per annum (i.e., *Nservings* × *P*(*ill*) where *Nservings* is the number of exposure units), as well as prior information about the fraction of exposure units that are contaminated (i.e.,*P*(*exp*)). This evidence can be used to solve

$$P(ill|exp) = \frac{P(ill)}{P(exp)}\tag{6}$$

using Monte Carlo methods. The result is a posterior distribution of this conditional probability.

Contemplating this calculation, however, highlights a potential problem in using the posterior distribution to make risk projections that might result from changes in exposure. Because this posterior is derived as a ratio of two random variables (each describing uncertainty about a 4 Will-be-set-by-IN-TECH

Fig. 1. Venn diagram describing the probability of exposure *P*(*exp*) and the probability of

multiplied by the number (or mass) of servings of a food consumed per year.

A variation on this equation was employed by Bartholomew et al. (2005) to derive a linear risk model. In that example, the ratio of the estimated number of illnesses and number of exposures per annum derives a constant of proportionality that is ultimately used to project changes in illnesses from intentional changes in exposure. It should be noted that numbers of illnesses and exposures are simple transformations of *P*(*ill*) and *P*(*exp*) where each is

In microbial risk assessments, we usually have prior information about the number of illnesses per annum (i.e., *Nservings* × *P*(*ill*) where *Nservings* is the number of exposure units), as well as prior information about the fraction of exposure units that are contaminated (i.e.,*P*(*exp*)). This

*<sup>P</sup>*(*ill*|*exp*) = *<sup>P</sup>*(*ill*)

using Monte Carlo methods. The result is a posterior distribution of this conditional

Contemplating this calculation, however, highlights a potential problem in using the posterior distribution to make risk projections that might result from changes in exposure. Because this posterior is derived as a ratio of two random variables (each describing uncertainty about a

*<sup>P</sup>*(*exp*) (6)

Exposure servings

Illness servings

All servings

illness for contaminated servings (*P*(*ill*|*exp*)).

evidence can be used to solve

probability.

**Histogram of ill|exp**

Fig. 2. Histogram for a simplistic Monte Carlo calculation of *P*(*ill*|*exp*).

true parameter in nature), this distribution is necessarily informed by any covariance between these random variables. Furthermore, projections about future values of *P*(*ill*) derived by *Pnew*(*exp*)*P*(*ill*|*exp*) need to account for the starting value of *P*(*exp*).

A simple example may illustrate this point. Assume available prior evidence about numbers of illnesses (for a particular product-pathogen pair) is that 100, 200 or 300 cases occur per year with equal probability weights. Further, assume our understanding of exposures implies that 1%, 5% or 10% of 10,000 servings per year are contaminated with equal probability weights. A naive estimate of *P*(*ill*|*exp*) would look like the histogram in Figure 2.

This ratio ranges from 0.01 (100 illnesses divided by 10,000 exposures) to 0.30 (300 illnesses divided by 1000 exposures). Furthermore, we can re-derive the uniform distribution of *P*(*ill*) by simply multiplying the vector of *P*(*ill*|*exp*) by the vector of *P*(*exp*). But, if we consider

contamination following production (*X*) into measurements of human health risk at the point

A Bayesian Approach for Calibrating Risk Assessment Models 303

how changes in the production process would lead to a change the number of illnesses, then the distribution of *D* can be derived from a single component Δ, that describes the cumulative change in average microbial level between production and consumption (i.e., it combines the effects of storage time and temperature as well as cooking and other process). Assuming

The dose-dependent model can be simplified by treating Δ as a latent variable, with its

Additional simplifications of the model are possible in situations where pathogen numbers

For the model in Equation 1, the number of illnesses avoided by reducing the prevalence of contaminated servings is readily predicted via Equation 2. Reduced prevalence of contamination might occur via changes in import practices or improved animal husbandry practices that reduce the occurrence of a pathogen among farms, herds, flocks or sheds. These changes are expected to reduce the prevalence of contaminated carcasses, but in a number of situations it is still reasonable be assumed that *P*(*ill*|*exp*) would remain essentially

For example, suppose that a country, where a specific pathogen is endemic, will begin importing animal products from a country that is free from the disease. If the importation of uncontaminated carcasses is such that prevalence is reduced by *Pnew*(*exp*) = *δP*(*exp*) where uncertainty about change in prevalence might be characterized as *δ* ∼ *Beta*(*a*, *b*,), and it is reasonable to assume that *P*(*ill*|*exp*) will remain unchanged, then the human health benefit is

Note that this model relies only on the characterization of the number of illnesses (*λill*) and the effect of the change in importation policy. Also note that measures of prevalence are not necessarily the prevalence of contaminated servings. Instead, one can argue that the prevalence of contaminated units at the point of data collection is proportional to the prevalence of contaminated servings. This constant of proportionality cancels out when *P*(*exp*) and *Pnew*(*exp*) are measured at the same location in the farm-to-table continuum.

This formulation also obviates the need for modeling pathogen levels as well as eliminating the need to adjust for the difference between true and apparent prevalence. A linear relationship between contaminated carcass prevalence and human illnesses was

*Iavoided* ∼ *Poisson*(*Nservings*(*P*(*exp*)*P*(*ill*|*exp*) − *Pnew*(*exp*)*P*(*ill*|*exp*)) (8)

∼ *Poisson*((1 − *δ*)*λill*). (10)

*<sup>P</sup>*(*exp*) )*λill*) (9)

are uniformly low at the point of consumption (Williams, Ebel & Vose, 2011b).

<sup>Δ</sup>) estimated during calibration. Williams, Ebel & Vose (2011a) provide an

that the cumulative change is distributed as <sup>Δ</sup> <sup>∼</sup> *Lognormal*(*μ*Δ, *<sup>σ</sup>*<sup>2</sup>

example based on *Campylobacter* contamination in chicken.

<sup>∼</sup> *Poisson*((<sup>1</sup> <sup>−</sup> *Pnew*(*exp*)

*<sup>X</sup>*) and the focus of a risk assessment is to determine

<sup>Δ</sup>), the distribution for *D* is

of consumption (*D*).

*Lognormal*(*μ<sup>X</sup>* + *μ*Δ,

parameters (*μ*Δ, *σ*<sup>2</sup>

unchanged.

modeled as:

**4.2 Prevalence-dependent model**

If it is assumed *<sup>X</sup>* <sup>∼</sup> *Lognormal*(*μX*, *<sup>σ</sup>*<sup>2</sup>

 *σ*2 *<sup>X</sup>* <sup>+</sup> *<sup>σ</sup>*<sup>2</sup> Δ).

these two independent random variables, we will generate a distribution for *P*(*ill*) that is not at all what the prior *P*(*ill*) looked like.

Instead, the appropriate distribution for *P*(*ill*|*exp*) is a distribution particular to the value of *P*(*exp*). For example, *P*(*ill*|*exp*) for *P*(*exp*) = 0.01 is a discrete Uniform distributions of values (100/1000, 200/1000, 300/1000).

If the analyst wants to predict the effect of a change in *P*(*exp*), this dependence between *P*(*exp*) and *P*(*ill*|*exp*) should be borne in mind. Otherwise, incorrect representations of *P*(*ill*) could result. Fortunately, the model simplifications developed in Williams, Ebel & Vose (2011a) avoid this trap because the *P*(*ill*|*exp*) often cancels out of the equations and the change in illness occurrence can be estimated directly from changes in *P*(*exp*). The term *prevalence-dependent* model is used to describe applications where this simplification is feasible.

#### **4. Model simplification**

A complete evaluation of the components of the model in Equation 1 can still be a complex task. Nevertheless, the factorization on *P*(*ill*) into its exposure component (*P*(*exp*)) and hazard characterization component (*P*(*ill*|*exp*)) leads to situations where estimation of the number of illnesses can be greatly simplified. We outline two different models and describe methods for simplification.

#### **4.1 Dose-dependent model**

The first parameterization assumes that all servings have some level of contamination, where *D* describes the average number of pathogens in each serving. Note that when *D* describes an average concentration, it is possible for these concentration values to be much less than 1 unit per serving. Common examples are the description of pathogen levels in water. It may also be reasonable to model average concentrations for liquid and ground food products where no natural units exist. An exposure event from a particular food type will involve the ingestion of a random number of pathogenic organisms, where the distribution of organisms is described by the probability density *f*(*D*). The lognormal distribution is a common and convenient choice (Limpert et al., 2001), so *<sup>f</sup>*(*D*) <sup>∼</sup> *Lognormal*(*μD*, *<sup>σ</sup>*<sup>2</sup> *<sup>D</sup>*). The probability that a random person will become ill, given a microbial dose of size *D*, is *P*(*ill*|*D*). Averaging across all possible doses yields the probability of a person becoming ill given exposure to the pathogen. When *D* describes an average dose, the probability of illness given exposures described by a continuous dose distribution is

$$P(ill) = \int\_0^\infty P(ill|D)f(D)dD,\tag{7}$$

where *P*(*ill*|*D*) is the dose-response function. The exponential and beta-Poisson dose-response functions are appropriate for continuous dose distributions. The term dose-dependent model will be used to denote this model.

The difficulty with this model is that data describing the dose at the point of consumption are not available. Instead, virtually all risk assessment models rely on a measurement of contamination, *X*, derived from data collected at a more convenient location in the farm-to-table continuum, such as during production or at retail. A typical risk assessment must rely on models of post-production activities to transforms measurements of microbial contamination following production (*X*) into measurements of human health risk at the point of consumption (*D*).

If it is assumed *<sup>X</sup>* <sup>∼</sup> *Lognormal*(*μX*, *<sup>σ</sup>*<sup>2</sup> *<sup>X</sup>*) and the focus of a risk assessment is to determine how changes in the production process would lead to a change the number of illnesses, then the distribution of *D* can be derived from a single component Δ, that describes the cumulative change in average microbial level between production and consumption (i.e., it combines the effects of storage time and temperature as well as cooking and other process). Assuming that the cumulative change is distributed as <sup>Δ</sup> <sup>∼</sup> *Lognormal*(*μ*Δ, *<sup>σ</sup>*<sup>2</sup> <sup>Δ</sup>), the distribution for *D* is *Lognormal*(*μ<sup>X</sup>* + *μ*Δ, *σ*2 *<sup>X</sup>* <sup>+</sup> *<sup>σ</sup>*<sup>2</sup> Δ).

The dose-dependent model can be simplified by treating Δ as a latent variable, with its parameters (*μ*Δ, *σ*<sup>2</sup> <sup>Δ</sup>) estimated during calibration. Williams, Ebel & Vose (2011a) provide an example based on *Campylobacter* contamination in chicken.

Additional simplifications of the model are possible in situations where pathogen numbers are uniformly low at the point of consumption (Williams, Ebel & Vose, 2011b).

#### **4.2 Prevalence-dependent model**

6 Will-be-set-by-IN-TECH

these two independent random variables, we will generate a distribution for *P*(*ill*) that is not

Instead, the appropriate distribution for *P*(*ill*|*exp*) is a distribution particular to the value of *P*(*exp*). For example, *P*(*ill*|*exp*) for *P*(*exp*) = 0.01 is a discrete Uniform distributions of values

If the analyst wants to predict the effect of a change in *P*(*exp*), this dependence between *P*(*exp*) and *P*(*ill*|*exp*) should be borne in mind. Otherwise, incorrect representations of *P*(*ill*) could result. Fortunately, the model simplifications developed in Williams, Ebel & Vose (2011a) avoid this trap because the *P*(*ill*|*exp*) often cancels out of the equations and the change in illness occurrence can be estimated directly from changes in *P*(*exp*). The term *prevalence-dependent* model is used to describe applications where this simplification

A complete evaluation of the components of the model in Equation 1 can still be a complex task. Nevertheless, the factorization on *P*(*ill*) into its exposure component (*P*(*exp*)) and hazard characterization component (*P*(*ill*|*exp*)) leads to situations where estimation of the number of illnesses can be greatly simplified. We outline two different models and describe

The first parameterization assumes that all servings have some level of contamination, where *D* describes the average number of pathogens in each serving. Note that when *D* describes an average concentration, it is possible for these concentration values to be much less than 1 unit per serving. Common examples are the description of pathogen levels in water. It may also be reasonable to model average concentrations for liquid and ground food products where no natural units exist. An exposure event from a particular food type will involve the ingestion of a random number of pathogenic organisms, where the distribution of organisms is described by the probability density *f*(*D*). The lognormal distribution is a common and convenient

person will become ill, given a microbial dose of size *D*, is *P*(*ill*|*D*). Averaging across all possible doses yields the probability of a person becoming ill given exposure to the pathogen. When *D* describes an average dose, the probability of illness given exposures described by a

where *P*(*ill*|*D*) is the dose-response function. The exponential and beta-Poisson dose-response functions are appropriate for continuous dose distributions. The term

The difficulty with this model is that data describing the dose at the point of consumption are not available. Instead, virtually all risk assessment models rely on a measurement of contamination, *X*, derived from data collected at a more convenient location in the farm-to-table continuum, such as during production or at retail. A typical risk assessment must rely on models of post-production activities to transforms measurements of microbial

 ∞ 0

*<sup>D</sup>*). The probability that a random

*P*(*ill*|*D*)*f*(*D*)*dD*, (7)

at all what the prior *P*(*ill*) looked like.

(100/1000, 200/1000, 300/1000).

is feasible.

**4. Model simplification**

methods for simplification.

**4.1 Dose-dependent model**

continuous dose distribution is

choice (Limpert et al., 2001), so *<sup>f</sup>*(*D*) <sup>∼</sup> *Lognormal*(*μD*, *<sup>σ</sup>*<sup>2</sup>

dose-dependent model will be used to denote this model.

*P*(*ill*) =

For the model in Equation 1, the number of illnesses avoided by reducing the prevalence of contaminated servings is readily predicted via Equation 2. Reduced prevalence of contamination might occur via changes in import practices or improved animal husbandry practices that reduce the occurrence of a pathogen among farms, herds, flocks or sheds. These changes are expected to reduce the prevalence of contaminated carcasses, but in a number of situations it is still reasonable be assumed that *P*(*ill*|*exp*) would remain essentially unchanged.

For example, suppose that a country, where a specific pathogen is endemic, will begin importing animal products from a country that is free from the disease. If the importation of uncontaminated carcasses is such that prevalence is reduced by *Pnew*(*exp*) = *δP*(*exp*) where uncertainty about change in prevalence might be characterized as *δ* ∼ *Beta*(*a*, *b*,), and it is reasonable to assume that *P*(*ill*|*exp*) will remain unchanged, then the human health benefit is modeled as:

$$I\_{\text{avoided}} \sim \text{Poisson}(\text{N}\_{\text{servings}}(\text{P}(\exp)\text{P}(\text{ill}|\exp) - \text{P}\_{\text{new}}(\exp)\text{P}(\text{ill}|\exp)) \tag{8}$$

$$
\sim Poisson((1 - \frac{P\_{new}(exp)}{P(exp)})\lambda\_{ill})\tag{9}
$$

$$\sim Poisson((1-\delta)\lambda\_{\text{ill}}).\tag{10}$$

Note that this model relies only on the characterization of the number of illnesses (*λill*) and the effect of the change in importation policy. Also note that measures of prevalence are not necessarily the prevalence of contaminated servings. Instead, one can argue that the prevalence of contaminated units at the point of data collection is proportional to the prevalence of contaminated servings. This constant of proportionality cancels out when *P*(*exp*) and *Pnew*(*exp*) are measured at the same location in the farm-to-table continuum.

This formulation also obviates the need for modeling pathogen levels as well as eliminating the need to adjust for the difference between true and apparent prevalence. A linear relationship between contaminated carcass prevalence and human illnesses was

4. Draw an unequal probability with-replacement sample of size *m* << *N* from (*θ*1, *θ*2, ...*θN*)

A Bayesian Approach for Calibrating Risk Assessment Models 305

As *N*/*m* → ∞ the SIR algorithm produces an exact sample from the posterior distribution. Previous studies have found that values for *N*/*m* ranging from 20 to 40 are often sufficient

To illustrate the SIR algorithm, consider the problem of estimating the prevalence *θ* of a disease in a herd of animals when new sampling evidence is combined with prior information. Suppose the new evidence is a sample from the herd of size *n* = 20 of which *s* = 4 samples are positive. Suppose the prior evidence on the prevalence in the herd can be summarized by a beta distribution of the form *θ* ∼ *Beta*(*a* = 1, *b* = 6). In this example the model, *M*(*θ*), uses the prior information on prevalence, and the number of additional test results, to predict the number of infected animals. Using the model *s* ∼ *Binomial*(*n*, *θ*) and Bayes formula, it is known that the distribution the resulting posterior distribution *p*(*θ*|*s*, *n*) ∼ *Beta*(*s* + *a*, *n* − *s* + *b*) = *Beta*(5, 22). The following *R* code (R Development Core Team, 2011) demonstrates the SIR algorithm and illustrates the equivalence of the SIR solution

(Rubin, 1987), but appropriate values must be considered on a case-by-case basis.
