example of the beta-binomial by SIR

demonstrated for chicken and campylobacteriosis in a previous risk assessment (Bartholomew et al., 2005).

### **5. Bayesian methods**

Recent research interest has focused on the replacement of Monte Carlo models with a Bayesian approach that uses Markov chain Monte Carlo (MCMC) methods (Albert et al., 2008; Hald et al., 2004; Parsonsa et al., 2005). Along with this proposed approach comes the inevitable suggestion that the models be built using MCMC packages such as WinBUGS (Lunn et al., 2009; 2000; Vose, 2008). These packages often rely on the Gibbs sampler, or similar algorithms, to obtain a set of random samples from the posterior probability distribution of the risk assessment model. While it is possible to use software packages such as WinBUGS (Williams, Ebel & Hoeting, 2011), personal experience suggests that convergence is difficult to achieve given the high degree of uncertainty in the parameters of even a highly simplified food-safety risk assessment model. The underlying problem can be understood by examining the mechanics of an MCMC algorithm.

Sampling and numerical search algorithms generally follow two approaches. MCMC algorithms generate a new realization of the model parameters at each iteration (following a burn-in period). When the model converges, each iteration is an element or observation from the posterior distribution.

An alternative approach to MCMC algorithms are algorithms that first generate a large number of candidate values for each parameter using Monte Carlo simulation. Bayesian logic combines the new evidence, denoted by *E*, with the Monte Carlo parameter estimates to select or reweight a subset of the Monte Carlo generated parameters. In this application, the new evidence will be the illness count from a public health surveillance system (i.e., *E* = *Iobserved*).

The algorithm we employ is the sampling importance resampling (SIR) approach proposed by Rubin (1987). This method generates an unequal probability sample where the sample weights are determined by the degree of agreement between the prior information and the new sampling evidence.

To demonstrate, let *θ* represent a vector of model inputs. Examples of model inputs are parameters describing the contamination distribution and dose-response function. The input parameters are not fixed values, so uncertainty is represented by the distribution *p*(*θ*). This distribution is referred to as the prior or pre-model distribution for the inputs. Consider a process model that predicts, among other parameters, the rate parameter describing the number of human illnesses (Equation 1). Let this model be denoted by *M*(). For a randomly sampled *θ* from *p*(*θ*), the observed output from the process model is *M*(*θ*). The *M*(*θ*) value will be compared to the observed number of illnesses from public health surveillance, which is denoted by *Iobserved*.

The algorithm for implementing the SIR is:


8 Will-be-set-by-IN-TECH

demonstrated for chicken and campylobacteriosis in a previous risk assessment (Bartholomew

Recent research interest has focused on the replacement of Monte Carlo models with a Bayesian approach that uses Markov chain Monte Carlo (MCMC) methods (Albert et al., 2008; Hald et al., 2004; Parsonsa et al., 2005). Along with this proposed approach comes the inevitable suggestion that the models be built using MCMC packages such as WinBUGS (Lunn et al., 2009; 2000; Vose, 2008). These packages often rely on the Gibbs sampler, or similar algorithms, to obtain a set of random samples from the posterior probability distribution of the risk assessment model. While it is possible to use software packages such as WinBUGS (Williams, Ebel & Hoeting, 2011), personal experience suggests that convergence is difficult to achieve given the high degree of uncertainty in the parameters of even a highly simplified food-safety risk assessment model. The underlying problem can be understood by examining

Sampling and numerical search algorithms generally follow two approaches. MCMC algorithms generate a new realization of the model parameters at each iteration (following a burn-in period). When the model converges, each iteration is an element or observation

An alternative approach to MCMC algorithms are algorithms that first generate a large number of candidate values for each parameter using Monte Carlo simulation. Bayesian logic combines the new evidence, denoted by *E*, with the Monte Carlo parameter estimates to select or reweight a subset of the Monte Carlo generated parameters. In this application, the new evidence will be the illness count from a public health surveillance system (i.e., *E* = *Iobserved*). The algorithm we employ is the sampling importance resampling (SIR) approach proposed by Rubin (1987). This method generates an unequal probability sample where the sample weights are determined by the degree of agreement between the prior information and the

To demonstrate, let *θ* represent a vector of model inputs. Examples of model inputs are parameters describing the contamination distribution and dose-response function. The input parameters are not fixed values, so uncertainty is represented by the distribution *p*(*θ*). This distribution is referred to as the prior or pre-model distribution for the inputs. Consider a process model that predicts, among other parameters, the rate parameter describing the number of human illnesses (Equation 1). Let this model be denoted by *M*(). For a randomly sampled *θ* from *p*(*θ*), the observed output from the process model is *M*(*θ*). The *M*(*θ*) value will be compared to the observed number of illnesses from public health surveillance, which

3. Determine a weight for each *M*(*θi*). This weight describes the agreement between the model prediction and observe number of illnesses. For this application, the weight is *wi* =

et al., 2005).

**5. Bayesian methods**

the mechanics of an MCMC algorithm.

from the posterior distribution.

new sampling evidence.

is denoted by *Iobserved*.

*P*(*Iobserved*|*M*(*θ*)).

The algorithm for implementing the SIR is:

2. For each *θi*, use the model to determine *M*(*θi*).

1. Draw *N* samples (*θ*1, *θ*2, ...*θN*) from the prior distribution *p*(*θ*).

4. Draw an unequal probability with-replacement sample of size *m* << *N* from (*θ*1, *θ*2, ...*θN*) using sample weights *wi*.

As *N*/*m* → ∞ the SIR algorithm produces an exact sample from the posterior distribution. Previous studies have found that values for *N*/*m* ranging from 20 to 40 are often sufficient (Rubin, 1987), but appropriate values must be considered on a case-by-case basis.

To illustrate the SIR algorithm, consider the problem of estimating the prevalence *θ* of a disease in a herd of animals when new sampling evidence is combined with prior information. Suppose the new evidence is a sample from the herd of size *n* = 20 of which *s* = 4 samples are positive. Suppose the prior evidence on the prevalence in the herd can be summarized by a beta distribution of the form *θ* ∼ *Beta*(*a* = 1, *b* = 6). In this example the model, *M*(*θ*), uses the prior information on prevalence, and the number of additional test results, to predict the number of infected animals. Using the model *s* ∼ *Binomial*(*n*, *θ*) and Bayes formula, it is known that the distribution the resulting posterior distribution *p*(*θ*|*s*, *n*) ∼ *Beta*(*s* + *a*, *n* − *s* + *b*) = *Beta*(5, 22). The following *R* code (R Development Core Team, 2011) demonstrates the SIR algorithm and illustrates the equivalence of the SIR solution and the known posterior distribution.

```