**8. Methods**

10 Will-be-set-by-IN-TECH

Surveillance sensitivity, in the context of testing for pathogens in food, is the probability that the pathogen is detected given that it exists in a sampled unit. Surveillance sensitivity is a function of test sensitivity in the sense that not only does a contaminated unit need to be sampled, but the results of testing must properly classify the unit as containing the pathogen of interest. When the sensitivity of a test is *Se*, and the number of units in the population is large in relation to the number of samples collected (*n*), and *p*(*S*+) is the proportion of the units that are contaminated. Then the surveillance system sensitivity is typically given by

Note that the role of test sensitivity is to modify the true prevalence term *p*(*S*+)) to provide

The concern with the standard approaches used in defining *Se* and equation 11 is that test sensitivity can decrease as the level of the pathogen drops, especially in cases where the average concentration is less than 1 cfu/tested unit (e.g., a test that uses 10 g of a 100 g food

In the testing for pathogens that occur at low levels, test sensitivity can be improved by employing enrichment techniques, increasing incubation time, and increasing the volume of material sampled. All of these methods increase the number of pathogens in the medium to be tested. The bonding and potential encapsulation of a microbe within fatty tissues, insufficient time for cells to leave a quiescent state during incubation, the possibility of cells entering a viable but nonculturable state (Oliver, 2005; Oliver et al., 2005), and the small volume of

In this study, a situation is examined in which the prevalence of positive samples doubled over a one-year period. During this time, a minor modification was made to the enrichment technique used for testing. While the laboratory had performed testing to determine the equivalence of the new and old methodology, insufficient evidence exists to determine if the observed increase in prevalence is due to the change of the enrichment methodology or

This study is predicated on the assumption that the observed increase in *E.coli* O157:H7 positive samples is the result of a change in enrichment media, rather than an actual increase in contamination. The analyses presented assess what the change in test sensitivity would be if this assumption where true. This initial analysis is used to specify both the required sample

The Food Safety and Inspection Service of the United States Department of Agriculture has been collecting ground beef samples from all slaughter and grinding facilities producing ground beef products since the beginning of the year 2000. Nevertheless, the enumeration of positive samples was only begun in January 2007. This more limited dataset was used in this analysis. Each facility that produces ground beef for distribution is sampled on multiple occasions every year and the sample unit consists of approximately 900 *g* of ground beef

unit with only 1 organism will have an average concentration of 0.1 *cfu/g*).

whether the change was due to an actual change in pathogen prevalence.

and the concentration of *E.coli* O157:H7 used to spike validation samples.

collected at the end of production from approximately 5,000 kg lots.

material tested all can lead to reductions in test sensitivity.

the apparent prevalence.

**7. Data description**

*<sup>P</sup>*(detecting one or more positives) = <sup>1</sup> <sup>−</sup> (<sup>1</sup> <sup>−</sup> *Se* <sup>×</sup> *<sup>p</sup>*(*S*+))*<sup>n</sup>*. (11)

#### **8.1 Estimating the distribution for the average concentration of** *E.coli* **O157:H7**

Ground beef at the point of production comprise no natural units, so it can be viewed as a homogeneously mixed product that mimics a viscous liquid. As such, the estimation of the levels of pathogens in ground beef follows a similar methodology as used for describing the distribution of contaminates in other fluids, such as water. The grinding process inevitably introduces some microscopic voids in the medium, so the unit of measurement is typically in grams rather than milliliters.

The sampling data are censored in the sense that the true number of organisms is not observed for all samples. This occurs because the test has a level of detection (*LOD*) at which the probability of a positive is low even though the original sample contained one or more viable organisms (Helsel, 2005).

Contamination generally occurs at very low levels for the vast majority of ground beef production. Nevertheless, there are situations where high contamination levels can occur. The commonly used biologically plausible model assumes that the average concentration of contamination varies according to a Lognormal distribution (i.e.,*<sup>X</sup>* <sup>∼</sup> *Lognormal*(*μ*, *<sup>σ</sup>*2) (Haas et al., 1999).

A two-step process is used to incorporate the variability in *Q* and *Z*. First, assume a fixed probability of detection for each organism. Applying the Theorem of Total Probability across

A Bayesian Approach for Calibrating Risk Assessment Models 309

(<sup>1</sup> <sup>−</sup> (<sup>1</sup> <sup>−</sup> *<sup>q</sup>*)*z*)

<sup>−</sup>*<sup>x</sup>* <sup>∞</sup> ∑ *z*=0

<sup>−</sup>*<sup>x</sup>* <sup>×</sup> *<sup>e</sup>*

<sup>−</sup>*qx*.

*e*−*<sup>x</sup> x*−*<sup>z</sup> z*!

*P*(*T* + |*Q* = *q*, *X* = *x*)*f*(*Q* = *q*)*dq*

Γ(*α* + *i*) Γ(*α* + *β* + *i*)

Γ(*α* + *β*) Γ(*α*)Γ(*β*)

((<sup>1</sup> <sup>−</sup> *<sup>q</sup>*)*x*)−*<sup>z</sup> z*!

(*x*−*qx*)

The probability of detection for each cell is likely to vary across the population due to factors such as genetic diversity and the level of damage the cell may have received during the production process due to the possible application of antimicrobials and steam to carcasses, as well as drying, freezing and storage times prior to sample collection and testing. Assuming that the detection probability is *Q* ∼ *Beta*(*α*, *β*) and rate parameter *X*, the probability of

<sup>−</sup>*qx*)

The integral in this expression describes a confluent hypergeometric function. Using the

The ultimate goal is to determine the probability of a positive test across the distribution of possible contamination levels. If *fX*(*x*) is the Lognormal probability density function

where *P*(*T* + |*X* = *x*) is as defined in either 16 or 17. Note that the series expansion for the

range of integration for *x* is (0, ∞). This causes numerical problems for levels of *x* greater than approximately 50 cfus and precludes the use of the numerical approximations based on the

> Γ(*α* + *β*) Γ(*α*)Γ(*β*)

∞ ∑ *i*=0

Γ(*α*)

describing contamination levels across the population of ground beef, then

0

<sup>−</sup>*qx*)

confluent hypergeometric in Equation 17 includes an alternating power series of *x<sup>i</sup>*

*<sup>P</sup>*(*T*+) = <sup>∞</sup>

A solution to the problem is to combine 18 and 16 as

 1 0 (1 − *e*

0

*<sup>P</sup>*(*T*+) = <sup>∞</sup>

*P*(*T* + |*Q* = *q*, *X* = *x*, *Z* = *z*)*P*(*X* = *x*, *Z* = *z*) (15)

*<sup>q</sup>α*−1(<sup>1</sup> <sup>−</sup> *<sup>q</sup>*)*β*−1*dq*. (16)

*<sup>i</sup>*! . (17)

, and the

(−1)*i*−1*x<sup>i</sup>*

*P*(*T* + |*X* = *x*)*fX*(*x*)*dx*, (18)

*<sup>q</sup>α*−1(<sup>1</sup> <sup>−</sup> *<sup>q</sup>*)*β*−<sup>1</sup>*dqfX*(*x*)*dqdx*. (19)

the Poisson distributed organism counts, with rate parameter *X*, gives

= ∞ ∑ *z*=0

∞ ∑ *z*=0

= 1 − *e*

= 1 − *e*

= 1 − *e*

0

= 1 0 (1 − *e*

results of (Haas et al., 1999) the exact solution can be written as

*<sup>P</sup>*(*<sup>T</sup>* <sup>+</sup> <sup>|</sup>*<sup>X</sup>* <sup>=</sup> *<sup>x</sup>*) = <sup>Γ</sup>(*<sup>α</sup>* <sup>+</sup> *<sup>β</sup>*)

*<sup>P</sup>*(*<sup>T</sup>* <sup>+</sup> <sup>|</sup>*<sup>X</sup>* <sup>=</sup> *<sup>x</sup>*) = <sup>1</sup>

*P*(*T* + |*Q* = *q*, *X* = *x*) =

detection is given by:

series expansion.

The parameters of the contamination distribution can be estimated using maximum likelihood. To account for the different levels of detection, the total number of samples, *N*, can be broken down into *N* = *Nneg* + *Nqual*<sup>+</sup> + *Nquan*+. The two levels of detection are given by *LODqual*<sup>+</sup> = 0.003 and *LODquan*<sup>+</sup> = 0.03.

The likelihood is given by

$$L = \left[F(LOD\_{qual+})\right]^{N\_{neg}} \times \left[F(LOD\_{quan+}) - F(LOD\_{qual+})\right]^{N\_{qual+}} \times \prod\_{i=1}^{N\_{quan+}} f(\mathbf{x}\_i),\tag{12}$$

where *xi* is the estimated number of organisms from the MPN analysis for the *i th* quantitatively positive sample, and *F* and *f* represent the cumulative and probability density function of the Lognormal distribution of contamination levels. Applying the specific LOD values for this application yields;

$$L = \left[ F(0.003) \right]^{N\_{\text{ue}}} \times \left[ F(0.03) - F(0.003) \right]^{N\_{\text{quul}+}} \times \prod\_{i=1}^{N\_{\text{quul}+}} f(\mathbf{x}\_i). \tag{13}$$

A nonlinear optimization routine (*optim* in R) was used to estimate *μ*ˆ and *σ*ˆ 2. This routine also provides the variance-covariance matrix so that *var*[*μ*ˆ], *var*[*σ*ˆ <sup>2</sup>] and *cov*[*μ*ˆ, *σ*ˆ <sup>2</sup>] are available for further analyses. These estimates will be used as hyper-priors describing uncertainty in the estimated pathogen levels.

#### **8.2 Defining test sensitivity as a function of pathogen level**

The process under which a positive test occurs requires two sequential outcomes:


Two factors can affect the probability of detection. The first is that material from the single cell undergoes successful amplification. The alternative is that the cell leaves its quiescent state during the enrichment phase and begins the process of exponential growth.

Suppose there are *Z* cells in the sample and for each cell there is a probability *Q* of detection. Given the probability of detection for an individual cell, and the number of cells, the probability of a positive test is defined as the compliment of non-detection for each of the *Z* organisms, therefore

$$P(T+|Q=q, Z=z) = 1 - (1-q)^z. \tag{14}$$

This probability assumes fixed values for *Q* and *Z*, though there is likely to be variation among organisms as well as variation in the number of organisms.

Suppose for each cell in the sample the probability, *Q*, of detection is distributed in the population as a Beta distribution with parameters *α* and *β*. Similarly, let *Z* represent the number of viable organisms per gram initially in the sample. Assuming that the organisms are uniformly distributed in the medium, *Z* can be modeled as a Poisson distribution. The rate parameter (*X*) for any given sample is determined by a draw from the Lognormal distribution describing *E. coli* O157:H7 levels previously described.

12 Will-be-set-by-IN-TECH

The parameters of the contamination distribution can be estimated using maximum likelihood. To account for the different levels of detection, the total number of samples, *N*, can be broken down into *N* = *Nneg* + *Nqual*<sup>+</sup> + *Nquan*+. The two levels of detection are given

*F*(*LODquan*+) − *F*(*LODqual*+)

where *xi* is the estimated number of organisms from the MPN analysis for the *i*

quantitatively positive sample, and *F* and *f* represent the cumulative and probability density function of the Lognormal distribution of contamination levels. Applying the specific LOD

*F*(0.03) − *F*(0.003)

A nonlinear optimization routine (*optim* in R) was used to estimate *μ*ˆ and *σ*ˆ 2. This routine also provides the variance-covariance matrix so that *var*[*μ*ˆ], *var*[*σ*ˆ <sup>2</sup>] and *cov*[*μ*ˆ, *σ*ˆ <sup>2</sup>] are available for further analyses. These estimates will be used as hyper-priors describing uncertainty in

The process under which a positive test occurs requires two sequential outcomes:

2. The DNA in a cell must be successfully amplified by PCR for detection to occur.

Two factors can affect the probability of detection. The first is that material from the single cell undergoes successful amplification. The alternative is that the cell leaves its quiescent state

Suppose there are *Z* cells in the sample and for each cell there is a probability *Q* of detection. Given the probability of detection for an individual cell, and the number of cells, the probability of a positive test is defined as the compliment of non-detection for each of the

*<sup>P</sup>*(*<sup>T</sup>* <sup>+</sup> <sup>|</sup>*<sup>Q</sup>* <sup>=</sup> *<sup>q</sup>*, *<sup>Z</sup>* <sup>=</sup> *<sup>z</sup>*) = <sup>1</sup> <sup>−</sup> (<sup>1</sup> <sup>−</sup> *<sup>q</sup>*)*<sup>z</sup>*

This probability assumes fixed values for *Q* and *Z*, though there is likely to be variation among

Suppose for each cell in the sample the probability, *Q*, of detection is distributed in the population as a Beta distribution with parameters *α* and *β*. Similarly, let *Z* represent the number of viable organisms per gram initially in the sample. Assuming that the organisms are uniformly distributed in the medium, *Z* can be modeled as a Poisson distribution. The rate parameter (*X*) for any given sample is determined by a draw from the Lognormal distribution

1. The orignal 325 *g* sample must contain one or more viable organisms.

during the enrichment phase and begins the process of exponential growth.

*Nqual*<sup>+</sup> <sup>×</sup>

*Nquan*<sup>+</sup> ∏ *i*=1

*Nqual*<sup>+</sup> <sup>×</sup>

*Nquan*<sup>+</sup> ∏ *i*=1

*f*(*xi*), (12)

*f*(*xi*). (13)

. (14)

*th*

by *LODqual*<sup>+</sup> = 0.003 and *LODquan*<sup>+</sup> = 0.03.

*F*(*LODqual*+)

values for this application yields;

the estimated pathogen levels.

*Z* organisms, therefore

*L* =

*F*(0.003)

**8.2 Defining test sensitivity as a function of pathogen level**

organisms as well as variation in the number of organisms.

describing *E. coli* O157:H7 levels previously described.

*Nneg* <sup>×</sup>

*Nneg* <sup>×</sup>

The likelihood is given by

*L* =

A two-step process is used to incorporate the variability in *Q* and *Z*. First, assume a fixed probability of detection for each organism. Applying the Theorem of Total Probability across the Poisson distributed organism counts, with rate parameter *X*, gives

$$\begin{split} P(T+|\mathbf{Q}=q,X=\mathbf{x}) &= \sum\_{z=0}^{\infty} P(T+|\mathbf{Q}=q,X=\mathbf{x},Z=z)P(X=\mathbf{x},Z=z) \\ &= \sum\_{z=0}^{\infty} (1-(1-q)^{z}) \frac{e^{-\mathbf{x}}\mathbf{x}^{-z}}{z!} \\ &= 1 - e^{-\mathbf{x}} \sum\_{z=0}^{\infty} \frac{((1-q)\mathbf{x})^{-z}}{z!} \\ &= 1 - e^{-\mathbf{x}} \times e^{(\mathbf{x}-q\mathbf{x})} \\ &= 1 - e^{-q\mathbf{x}}. \end{split} \tag{15}$$

The probability of detection for each cell is likely to vary across the population due to factors such as genetic diversity and the level of damage the cell may have received during the production process due to the possible application of antimicrobials and steam to carcasses, as well as drying, freezing and storage times prior to sample collection and testing. Assuming that the detection probability is *Q* ∼ *Beta*(*α*, *β*) and rate parameter *X*, the probability of detection is given by:

$$P(T+|X=\mathbf{x}) = \int\_0^1 P(T+|Q=q, X=\mathbf{x}) f(Q=q) dq$$

$$= \int\_0^1 (1-e^{-q\mathbf{x}}) \frac{\Gamma(a+\beta)}{\Gamma(a)\Gamma(\beta)} q^{a-1} (1-q)^{\beta-1} dq. \tag{16}$$

The integral in this expression describes a confluent hypergeometric function. Using the results of (Haas et al., 1999) the exact solution can be written as

$$P(T+|X=x) = \frac{\Gamma(\mathfrak{a}+\mathfrak{f})}{\Gamma(\mathfrak{a})} \sum\_{i=0}^{\infty} \frac{\Gamma(\mathfrak{a}+i)}{\Gamma(\mathfrak{a}+\mathfrak{f}+i)} \frac{(-1)^{i-1}\mathfrak{x}^{i}}{i!}.\tag{17}$$

The ultimate goal is to determine the probability of a positive test across the distribution of possible contamination levels. If *fX*(*x*) is the Lognormal probability density function describing contamination levels across the population of ground beef, then

$$P(T+) = \int\_0^\infty P(T+|X=x) f\_X(x) dx,\tag{18}$$

where *P*(*T* + |*X* = *x*) is as defined in either 16 or 17. Note that the series expansion for the confluent hypergeometric in Equation 17 includes an alternating power series of *x<sup>i</sup>* , and the range of integration for *x* is (0, ∞). This causes numerical problems for levels of *x* greater than approximately 50 cfus and precludes the use of the numerical approximations based on the series expansion.

A solution to the problem is to combine 18 and 16 as

$$P(T+) = \int\_0^\infty \int\_0^1 (1 - e^{-qx}) \frac{\Gamma(a+\beta)}{\Gamma(a)\Gamma(\beta)} q^{a-1} (1-q)^{\beta-1} dq f\_X(x) dq dx. \tag{19}$$

**8.3 Description of surveillance data**

average of approximately 0.20 to 0.36%.

as a Poisson distribution, giving

and

25, 000.

samples (0.2 %).

**8.4 Bayesian model**

The goal of the analysis is to quantify the parameters *α*, *β*, *αnew*, *βnew*. The differences in the resulting test sensitivity models can be used to inform additional analyses related to determining the necessary sample size to test for equivalence of the two enrichment media. The enumerated values from the surveillance data are used to characterize the distribution of contamination in ground beef (i.e., *fX*(*x*)). These surveillance data from 2000-2006 represent what is thought to be a time period of relative stability in the level of contamination and are characterized by annual rate of positives samples of approximately 1 positive for every 500

A Bayesian Approach for Calibrating Risk Assessment Models 311

In the following three years there were *T*<sup>2007</sup> = 26, *T*<sup>2008</sup> = 43 and *T*<sup>2009</sup> = 35 positive tests observed. The number of tests performed was *N*<sup>2007</sup> = 9, 401, *N*<sup>2008</sup> = 10, 760 and *N*<sup>2009</sup> = 10, 774. The 2007 data represent the last year during which no changes were made to the laboratory protocols. The 2008 and 2009 data represent the first years in which the new enrichment technique was implemented and there was an immediate increase in the number of positives samples observed, with the percent positive rate jumping from a long-term

Application of the SIR routine requires the definition of the model (*M*(*θi*) and the likelihood

In this application, the model generates estimates of the number of positive tests during the 2007 and the combined time period 2008-2009. Define these two predictions as *M*2007(*θ*2007) = *N*2007*P*(*T*+) and *M*2008−9(*θ*2008−9)=(*N*<sup>2008</sup> + *N*2009)*Pnew*(*T*+). The parameter vectors consist of *<sup>θ</sup>*<sup>2007</sup> = (*μ*, *<sup>σ</sup>*2, *<sup>α</sup>*, *<sup>β</sup>*) and *<sup>θ</sup>*2008−<sup>9</sup> = (*μ*, *<sup>σ</sup>*2, *<sup>α</sup>new*, *<sup>β</sup>new*). Uncertainty in the *<sup>μ</sup>*, *<sup>σ</sup>*<sup>2</sup> parameters is modeled by a multivariate Normal distribution where the mean values and variance-covariance matrix were derived from the maximum likelihood solution of Equation 13. Uniform priors were used for the *α* and *β* parameters of the test sensitivity models.

There is no evidence of plant-level clustering amongst the positive tests collected across the year 2000-2009, so it is reasonable to assume that the number of positive tests can be modeled

Each of these expressions serves as a likelihood function in a Bayesian model where uncertainty in *P*(*T*+) and *Pnew*(*T*+) provide extra Poisson variability. The model was solved using the SIR algorithm where the weights *wi* were the product of the two likelihoods, so

The linkage between the two likelihoods is the common distribution explaining the level of contamination. The sample sizes used in the SIR algorithm, were *N* = 1, 000, 000 and *m* =

*T*<sup>2007</sup> ∼ *Poisson*(*N*2007*P*(*T*+)) (24)

*T*2008−<sup>9</sup> ∼ *Poisson*((*N*<sup>2008</sup> + *N*2009)*Pnew*(*T*+)). (25)

*wi* = *P*(*T*2007|*N*2007*P*(*T*+)) × *P*(*T*2008−9|(*N*<sup>2008</sup> + *N*2009)*Pnew*(*T*+)). (26)

equation used to determine the sample weights (i.e., *wi* = *P*(*E*|*M*(*θ*)).

Equations 22 and 23 were used to calculate *P*(*T*+) and (*Pnew*(*T*+), respectively.

The evaluation of the double integral that results from combining 18 and 16 is numerically difficult and time consuming to reliably implement in a Monte Carlo simulation. One solution is to use computer hardware and software that accommodates parallel processing, such as the *Snowfall* package in *R* (Knaus et al., 2009; Rossini et al., 2007).

Another alternative comes from noting that the model in Equation 16 also appears in epidemiology and food-safety applications where it is functionally identical to a beta-Poisson dose-response function (Haas et al., 1999). In this setting the model is used to determine the probability of illness (*P*(*ill*)) with a Poisson distributed number of organisms with mean value *X*, with an individual dose-response

$$P(ill|D=d) = 1 - (1-q)^d.\tag{20}$$

In this equation *P*(*ill*|*D* = *d*) is the individual probability for illness, *d* the individual dose, and *q* an individual measure of susceptibility that is distributed in the population as a beta distribution with parameters *α* and *β*. Note that this expression is equivalent to the probability of illness given exposure (*P*(*ill*|*exp*)) defined previously.

An approximation to the exact solution in 18 has been derived for the beta-Poisson dose-response equation. The approximation is

$$P(T+|X=x) = 1 - \left(1 + \frac{x}{\beta}\right)^{-a}.\tag{21}$$

Furumoto & Mickey (1967) show, via a Taylor's series expansion, that this approximation is sufficiently accurate for *β* > *α* and *x* reasonably small. Prior experience indicates that the number of organisms is low for this application, however, the relationship between *β* and *α* is not known *a priori*. This model produces a sigmoidal curve in log-space and the interpretation of the two parameters is that *α* controls the slope of the curve while *β* controls the location.

The ultimate goal is to determine the probability of a positive test across the distribution of possible contamination levels. If *fX*(*x*) is the Lognormal probability density function describing contamination levels across the population of ground beef, then

$$P(T+) = \int\_0^\infty 1 - \left(1 + \frac{\chi}{\mathcal{P}}\right)^{-a} f\_\mathbf{X}(\mathbf{x})d\mathbf{x}.\tag{22}$$

Two different probabilities of a positive test are of interest. The first is as described in Equation 22, which is assumed to be the probability of a positive test prior to the change in enrichment methodology. The sensitivity of the test is greatly improved if the organism has sufficient time to undergo the process of replication. If, as postulated, the increase in the number of positives samples is a side-effect of an improvement in enrichment media, the probability of detection for each organism would be greater. The magnitude of this increase would be captured in the parameters that describe the probability of detection. Let the parameters *αnew* and *βnew* represent the postulated increase in test sensitivity associated with the new enrichment technique, then the probability of a positive test is

$$P\_{new}(T+) = \int\_0^\infty 1 - \left(1 + \frac{\mathbf{x}}{\beta\_{new}}\right)^{-\mathfrak{a}\_{new}} f\_X(\mathbf{x})d\mathbf{x} \tag{23}$$

#### **8.3 Description of surveillance data**

14 Will-be-set-by-IN-TECH

The evaluation of the double integral that results from combining 18 and 16 is numerically difficult and time consuming to reliably implement in a Monte Carlo simulation. One solution is to use computer hardware and software that accommodates parallel processing, such as the

Another alternative comes from noting that the model in Equation 16 also appears in epidemiology and food-safety applications where it is functionally identical to a beta-Poisson dose-response function (Haas et al., 1999). In this setting the model is used to determine the probability of illness (*P*(*ill*)) with a Poisson distributed number of organisms with mean

In this equation *P*(*ill*|*D* = *d*) is the individual probability for illness, *d* the individual dose, and *q* an individual measure of susceptibility that is distributed in the population as a beta distribution with parameters *α* and *β*. Note that this expression is equivalent to the probability

An approximation to the exact solution in 18 has been derived for the beta-Poisson

Furumoto & Mickey (1967) show, via a Taylor's series expansion, that this approximation is sufficiently accurate for *β* > *α* and *x* reasonably small. Prior experience indicates that the number of organisms is low for this application, however, the relationship between *β* and *α* is not known *a priori*. This model produces a sigmoidal curve in log-space and the interpretation of the two parameters is that *α* controls the slope of the curve while *β* controls the location. The ultimate goal is to determine the probability of a positive test across the distribution of possible contamination levels. If *fX*(*x*) is the Lognormal probability density function

 1 + *x β*

*P*(*T* + |*X* = *x*) = 1 −

describing contamination levels across the population of ground beef, then

with the new enrichment technique, then the probability of a positive test is

 ∞ 0

<sup>1</sup> <sup>−</sup> 1 +

*x βnew*

*Pnew*(*T*+) =

 ∞ 0

<sup>1</sup> <sup>−</sup> 1 + *x β*

Two different probabilities of a positive test are of interest. The first is as described in Equation 22, which is assumed to be the probability of a positive test prior to the change in enrichment methodology. The sensitivity of the test is greatly improved if the organism has sufficient time to undergo the process of replication. If, as postulated, the increase in the number of positives samples is a side-effect of an improvement in enrichment media, the probability of detection for each organism would be greater. The magnitude of this increase would be captured in the parameters that describe the probability of detection. Let the parameters *αnew* and *βnew* represent the postulated increase in test sensitivity associated

*P*(*T*+) =

*<sup>P</sup>*(*ill*|*<sup>D</sup>* <sup>=</sup> *<sup>d</sup>*) = <sup>1</sup> <sup>−</sup> (<sup>1</sup> <sup>−</sup> *<sup>q</sup>*)*d*. (20)

−*α*

. (21)

−*<sup>α</sup> fX*(*x*)*dx*. (22)

−*αnew fX*(*x*)*dx* (23)

*Snowfall* package in *R* (Knaus et al., 2009; Rossini et al., 2007).

of illness given exposure (*P*(*ill*|*exp*)) defined previously.

dose-response equation. The approximation is

value *X*, with an individual dose-response

The goal of the analysis is to quantify the parameters *α*, *β*, *αnew*, *βnew*. The differences in the resulting test sensitivity models can be used to inform additional analyses related to determining the necessary sample size to test for equivalence of the two enrichment media.

The enumerated values from the surveillance data are used to characterize the distribution of contamination in ground beef (i.e., *fX*(*x*)). These surveillance data from 2000-2006 represent what is thought to be a time period of relative stability in the level of contamination and are characterized by annual rate of positives samples of approximately 1 positive for every 500 samples (0.2 %).

In the following three years there were *T*<sup>2007</sup> = 26, *T*<sup>2008</sup> = 43 and *T*<sup>2009</sup> = 35 positive tests observed. The number of tests performed was *N*<sup>2007</sup> = 9, 401, *N*<sup>2008</sup> = 10, 760 and *N*<sup>2009</sup> = 10, 774. The 2007 data represent the last year during which no changes were made to the laboratory protocols. The 2008 and 2009 data represent the first years in which the new enrichment technique was implemented and there was an immediate increase in the number of positives samples observed, with the percent positive rate jumping from a long-term average of approximately 0.20 to 0.36%.

#### **8.4 Bayesian model**

Application of the SIR routine requires the definition of the model (*M*(*θi*) and the likelihood equation used to determine the sample weights (i.e., *wi* = *P*(*E*|*M*(*θ*)).

In this application, the model generates estimates of the number of positive tests during the 2007 and the combined time period 2008-2009. Define these two predictions as *M*2007(*θ*2007) = *N*2007*P*(*T*+) and *M*2008−9(*θ*2008−9)=(*N*<sup>2008</sup> + *N*2009)*Pnew*(*T*+). The parameter vectors consist of *<sup>θ</sup>*<sup>2007</sup> = (*μ*, *<sup>σ</sup>*2, *<sup>α</sup>*, *<sup>β</sup>*) and *<sup>θ</sup>*2008−<sup>9</sup> = (*μ*, *<sup>σ</sup>*2, *<sup>α</sup>new*, *<sup>β</sup>new*). Uncertainty in the *<sup>μ</sup>*, *<sup>σ</sup>*<sup>2</sup> parameters is modeled by a multivariate Normal distribution where the mean values and variance-covariance matrix were derived from the maximum likelihood solution of Equation 13. Uniform priors were used for the *α* and *β* parameters of the test sensitivity models. Equations 22 and 23 were used to calculate *P*(*T*+) and (*Pnew*(*T*+), respectively.

There is no evidence of plant-level clustering amongst the positive tests collected across the year 2000-2009, so it is reasonable to assume that the number of positive tests can be modeled as a Poisson distribution, giving

$$T\_{2007} \sim Poisson(N\_{2007}P(T+))\tag{24}$$

and

$$T\_{2008-9} \sim Poisson((N\_{2008} + N\_{2009})P\_{new}(T+)).\tag{25}$$

Each of these expressions serves as a likelihood function in a Bayesian model where uncertainty in *P*(*T*+) and *Pnew*(*T*+) provide extra Poisson variability. The model was solved using the SIR algorithm where the weights *wi* were the product of the two likelihoods, so

$$w\_i = P(T\_{2007} | N\_{2007} P(T+)) \times P(T\_{2008-9} | (N\_{2008} + N\_{2009}) P\_{new}(T+)). \tag{26}$$

The linkage between the two likelihoods is the common distribution explaining the level of contamination. The sample sizes used in the SIR algorithm, were *N* = 1, 000, 000 and *m* = 25, 000.

less than this theoretical upper bound, with the probabilities of a positive test being 0.21 and

A Bayesian Approach for Calibrating Risk Assessment Models 313

The level of the target organism at which the maximum difference between the two tests occurs can be estimated from the two models and is presented graphically in Figure 4. This value could then be used to determine both the appropriate concentration for testing and the number of samples required to achieve a test with a specified power. From a practical standpoint this may not be a reasonable approach, given that for this application the concentrations are very low and it becomes very difficult to accurately dilute concentrations

1e−05 1e−03 1e−01 1e+01

mean level on log10 scale

Fig. 4. Results of the model can be used to determine the average concentration at which the maximum discrepancies occurs. This concentration can be used in a validation test to determine if differences in the two enrichment media exist. The *LOD* is given by the solid vertical line. The dashed vertical line indicates the concentration at which the maximum

The solid vertical line on Figure 4 represents the difference in the probability of detection at the mean level of 1 organisms per 325 *g*, which is the *LOD* for the current sampling program. The maximum difference between the two test sensitivity models occurs for a concentration of roughly 1 organism per 114 *g* (dashed vertical line), at which point the probability of a positive test result is 0.45 and 0.67 for the old and new enrichment techniques, respectively.

difference in sensitively between the old and new enrichment methods occurs.

0.40.

at these low levels.

0.00

0.05

0.10

Difference in the probability of detection

0.15

#### **9. Results**

The test sensitivity models are essentially equivalent for extremely low concentrations and for concentrations substantially greater than the LOD, with the probability of detection being near zero at low levels and essentially one at higher concentrations (Figure 3). Any laboratory testing to determine the equivalence between the two enrichment methods could require an enormous sample size to detect a significant difference if the concentration were too high because essentially no difference exists between the two methods when samples contain more than four or five viable cells. However, there exists a range over which the models no longer overlap Figure. 3.

mean concentration on log10 scale

Fig. 3. Summary of the test sensitivity models. The discrepancies between the two models predominantly occur at concentrations much lower than 1 cfu per g. The *LOD* is given by the solid vertical line. The dashed vertical line indicates the concentration at which the maximum difference in sensitivity between the old and new enrichment methods occurs.

The vertical line in Figure 3 represent the mean concentration at the theoretical *LOD* of a single organism. If this is interpreted to mean that the test could truly identify 1 organism in a 325 *g* sample, than there will be one or more organisms contained in the sample with probability *<sup>P</sup>*(*T*+) = <sup>1</sup> <sup>−</sup> *<sup>e</sup>*−<sup>1</sup> <sup>=</sup> 0.63. A test with less than perfect sensitivity will have a lower probability of detection. The probabilities of detection at this concentration for both tests are substantially 16 Will-be-set-by-IN-TECH

The test sensitivity models are essentially equivalent for extremely low concentrations and for concentrations substantially greater than the LOD, with the probability of detection being near zero at low levels and essentially one at higher concentrations (Figure 3). Any laboratory testing to determine the equivalence between the two enrichment methods could require an enormous sample size to detect a significant difference if the concentration were too high because essentially no difference exists between the two methods when samples contain more than four or five viable cells. However, there exists a range over which the models no longer

> P(T+|X) old method P(T+|X) new enrichment

1e−07 1e−05 1e−03 1e−01 1e+01

Fig. 3. Summary of the test sensitivity models. The discrepancies between the two models predominantly occur at concentrations much lower than 1 cfu per g. The *LOD* is given by the

The vertical line in Figure 3 represent the mean concentration at the theoretical *LOD* of a single organism. If this is interpreted to mean that the test could truly identify 1 organism in a 325 *g* sample, than there will be one or more organisms contained in the sample with probability *<sup>P</sup>*(*T*+) = <sup>1</sup> <sup>−</sup> *<sup>e</sup>*−<sup>1</sup> <sup>=</sup> 0.63. A test with less than perfect sensitivity will have a lower probability of detection. The probabilities of detection at this concentration for both tests are substantially

solid vertical line. The dashed vertical line indicates the concentration at which the maximum difference in sensitivity between the old and new enrichment methods occurs.

mean concentration on log10 scale

**9. Results**

overlap Figure. 3.

0.0

 0.2

 0.4

 0.6

Probability of detection as a function of concentration

 0.8

 1.0 less than this theoretical upper bound, with the probabilities of a positive test being 0.21 and 0.40.

The level of the target organism at which the maximum difference between the two tests occurs can be estimated from the two models and is presented graphically in Figure 4. This value could then be used to determine both the appropriate concentration for testing and the number of samples required to achieve a test with a specified power. From a practical standpoint this may not be a reasonable approach, given that for this application the concentrations are very low and it becomes very difficult to accurately dilute concentrations at these low levels.

mean level on log10 scale

Fig. 4. Results of the model can be used to determine the average concentration at which the maximum discrepancies occurs. This concentration can be used in a validation test to determine if differences in the two enrichment media exist. The *LOD* is given by the solid vertical line. The dashed vertical line indicates the concentration at which the maximum difference in sensitively between the old and new enrichment methods occurs.

The solid vertical line on Figure 4 represents the difference in the probability of detection at the mean level of 1 organisms per 325 *g*, which is the *LOD* for the current sampling program. The maximum difference between the two test sensitivity models occurs for a concentration of roughly 1 organism per 114 *g* (dashed vertical line), at which point the probability of a positive test result is 0.45 and 0.67 for the old and new enrichment techniques, respectively.

Haas, C., Rose, J. & Gerba, C. (1999). *Quantitative Microbial Risk Assessment*, John Wiley & Sons,

A Bayesian Approach for Calibrating Risk Assessment Models 315

Hald, T., Vose, D., Wegener, H. & Koupeev, T. (2004). A bayesian approach to quantify

Harrigan, W. (1998). *Laboratory Methods in Food Microbiology*, Academic Press, San Diego,

Helsel, D. (2005). *Nondetects and Data Analysis: Statistics for Censored Environmental Data*, John

Herikstad, H., Motarjemi, Y. & Tauxe, R. (2002). Salmonella surveillance: A global survey of

Knaus, J., Porzelius, C., Binder, H. & Schwarzer, G. (2009). Easier parallel computing in r with

Limpert, E., Stahel, W. & Abbt, M. (2001). Log-normal distributions across the sciences: Keys

Lunn, D., Spiegelhalter, D., Thomas, A. & Best, N. (2009). The bugs project: Evolution, critique and future directions (with discussion), *Statistics in Medicine* 28: 3049–3082. Lunn, D., Thomas, A., Best, N. & Spiegelhalter, D. (2000). Winbugs - a bayesian

Oliver, J. (2005). The viable but nonculturable state in bacteria, *Journal of Microbiology*

Oliver, J., Dagher, M. & Linden (2005). Induction of *escherichia coli* and *salmonella*

Parsonsa, D., Ortona, T., D'Souza, J., Mooreb, A., Jonesc, R. & Dodd, C. (2005). A comparison

R Development Core Team (2011). *R: A Language and Environment for Statistical Computing*, R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.

Raftery, A., Givens, G. & Zeh, J. (1995). Inference from a deterministic population dynamics

Rossini, A., Tierney, L. & Li., N. (2007). Simple parallel statistical computing in r, *Journal of*

Rubin, D. (1987). The calculation of posterior distributions by data augmentation:

Scallan, E., Hoekstra, R., Angulo, F., Tauxe, R., Widdowson, M., Roy, S., Jones, J. & Griffin,

Vose, D. (2008). *Risk Analysis: A Quantitative Guide.* 3*rd edition*, John Wiley & Sons, West Sussex,

modelling framework: Concepts, structure, and extensibility, *Statistics and Computing*

*typhimurium* into the viable but nonculturable state following chlorination of

of three modelling approaches for quantitative risk assessment using the case study of *salmonella* spp. in poultry meat, *International Journal of Food Microbiology* 98: 35–51.

model for bowhead whales, *Journal of the American Statistical Society* 90(430): 402–416.

Comment: A noniterative sampling/importance resampling alternative to the data augmentation algorithm for creating a few imputations when fractions of missing information are modest: The sir algorithm, *Journal of the American Statistical Society*

P. (2011). Foodborne illness acquired in the united states major pathogens, *Emerging*

public health serotyping, *Epidemiology and Infection* 129(1): 1–8.

the contribution of animal-food sources to human salmonellosis, *Risk Analysis*

New York, New York.

Wiley & Sons., Hoboken, New Jersey.

and clues, *Bioscience* 51: 341–352.

URL: *http://www.R-project.org/*

*Infectious Diseases* 17(1): 7–15.

82(398): 543–546.

England.

snowfall and sfcluster, *The R Journal* 1: 54–59.

wastewater, *Journal of Water Health* 3: 249–257.

*Computational and Graphical Statistics* 16(2): 399–420.

24: 255–269.

California.

10(4): 325–337.

43: 93–100.

In retrospect is seems fairly obvious that equivalency testing for enrichment methodologies should be performed at levels near the *LOD*. The reasoning for this claim is twofold:


#### **10. Conclusions**

There are surprising number of applications where this simplified framework can be used to integrate surveillance data with data describing basic demographics of the population (e.g., prevalence and/or levels of contamination) and counts derived from a surveillance system. The advantage of using Bayesian methods with simplified risk assessment models is their ability to combine the available data, objectively calibrate the model to match the surveillance data, and to estimate latent variables in the model for which there are few data.

#### **11. Acknowledgement**

This research utilized the Colorado State University ISTeC Cray HPC System supported by NSF Grant CNS-0923386.

#### **12. References**


18 Will-be-set-by-IN-TECH

In retrospect is seems fairly obvious that equivalency testing for enrichment methodologies

1. Qualitative tests, such as PCR methods, are so sensitive that only a small number of cells

2. The purpose of enrichment techniques is to selectively instigate reproduction of the target organism. Given the exponential growth in the number of organisms once reproduction

There are surprising number of applications where this simplified framework can be used to integrate surveillance data with data describing basic demographics of the population (e.g., prevalence and/or levels of contamination) and counts derived from a surveillance system. The advantage of using Bayesian methods with simplified risk assessment models is their ability to combine the available data, objectively calibrate the model to match the surveillance

This research utilized the Colorado State University ISTeC Cray HPC System supported by

Albert, I., Grenier, E., Denis, J. & Rousseau, J. (2008). Quantitative risk assessment from farm

Allos, B., Moore, M., Griffin, P. & Tauxe, R. (2004). Surveillance for sporadic foodborne disease in the 21st century: The foodnet perspective, *Clinical Infectious Diseases* 38: 115–120. Bartholomew, M., Vose, D., Tollefson, L. & Travis, C. (2005). A linear model for managing

de Jong B & K., E. (2006). The comparative burden of salmonellosis in the european unionmember states, associated and candidate countries, *BMC Public Health* 6(4).

Ebel, E., Williams, M. & Schlosser, W. (2012). Parametric distributions of under-diagnosis

FSIS (1998). Microbiology laboratory guidebook. 3rd ed, *Technical report*, Food Safety and

Furumoto, W. & Mickey, R. (1967). A mathematical model for the infectivity-dilution curve of tobacco mosaic virus: Theoretical considerations, *Virology* 32: 216–223. Givens, G. H. (1993). *A Bayesian framework and importance of sampling methods for synthesizing*

to fork and beyond: A global bayesian approach concerning food-borne diseases,

the risk of antimicrobial resistance originating in food animals, *Risk Analysis*

parameters used to estimate annual burden of illness for five foodborne pathogens,

Inspection Service, US Department of Agriculture. http://www.fsis.usda.gov/ Science/Microbiological \_Lab\_Guidebook/index.asp; accessed October

*multiple sources of evidence and uncertainty linked by a complex process model*, PhD thesis,

data, and to estimate latent variables in the model for which there are few data.

should be performed at levels near the *LOD*. The reasoning for this claim is twofold:

are required for detection.

**10. Conclusions**

**11. Acknowledgement**

NSF Grant CNS-0923386.

*Risk Analysis* 28: 557–571.

URL: *www.biomedcentral.com/1471-2458/6/4*

University of Washington, Seattle, Washington.

*Journal of Food Protection* 74: (in press).

25(1): 99–108.

23, 2011.

**12. References**

has begun, a positive test result is expected.


Sergey Oladyshkin and Wolfgang Nowak

**Control via Robust Design** 

**Polynomial Response Surfaces for** 

**Probabilistic Risk Assessment and Risk** 

*Germany*

**17**

*SRC Simulation Technology, Institute of Hydraulic Engineering, University of Stuttgart*

Many engineering systems represent challenging classes of complex dynamic systems. Lacking information about their system properties leads to model uncertainties up to a level where quantification of uncertainties may become the dominant question in modeling, simulation and application tasks. Uncertainty quantification is the prerequisite for probabilistic risk assessment and related tasks. Current numerical simulation models are often too expensive for advanced application tasks that involve accurate uncertainty quantification, risk assessment and robust design. This Chapter will present recent approaches for these challenges based on polynomial response surface techniques, which reduce massively the initial complex model at surprising accuracy. The reduction is achieved via projections on orthonormal polynomial bases, which form a so-called response surface. This way, the model response to changes in uncertain parameters and design or control variables is represented by multi-variate polynomials for each output quantity of interest. This technique is known as polynomial chaos expansion (PCE) in the field of stochastic PDE solutions. The reduced model represented by the response surface is vastly faster than the original complex one, and thus provides a promising starting point for follow-up tasks: global sensitivity analysis, uncertainty quantification and probabilistic risk assessment and as well as robust design and control under uncertainty. Often, the fact that the response surface has known polynomial properties can further simplify these tasks. We will emphasize a more engineering-like language as compared to otherwise intense mathematical derivations found in the literature on PCE. Also we will make use of most recent developments in the theory of stochastic PDE solutions for engineering applications. The current Chapter provides tools based on PCE for global sensitivity analysis, uncertainty quantification and risk analysis as well as design under

In the present Chapter, we consider the response surface in a closed polynomial form. Obviously, a response surface can be constructed in different ways, e.g. it can be constructed directly on a dense Cartesian grid of input parameters at extremely high computational efforts. Likewise, conceptually straightforward numerical Monte Carlo (MC) simulation techniques are computationally demanding since the statistical accuracy of their predictions

**1. Introduction**

uncertainty (robust design).

**2. Response surface via polynomial chaos expansion**

