**Spatio-Temporal Disease Surveillance**

Ross Sparks, Sarah Bolt and Chris Okugami *CSIRO Mathematics, Informatics and Statistics, Sydney,* 

*Australia* 

#### **1. Introduction**

158 Bioterrorism

Wortmann, G. (2004). Ricin Toxin, In: *Physician's Guide to Terrorist Attack*, M.J. Roy (Ed.), 175-

Youn Y.S., Na, D.H., Yoo, S.D., et al. (2005). Carbohydrate-specifically Polyethylene Glycol-

Zilinskas, RA 1997. Iraq's Biological Weapons: The Past or Future. *JAMA,* Vol. 278, pp. 418-

modified Ricin A-chain with Improved Therapeutic Potential. *Int J Biochem Cell Biol*,

179, Human Press Inc., ISBN 1592596630, Totowa, New Jersey

Vol. 37, pp. 1525-1533, ISSN: 1357-2725

424, ISSN 0098-7484

Concern over bio-terrorism has led to a demand for automated methods for the surveillance of disease counts with the ability to rapidly detect outbreaks of disease. While traditional statistical process control methods such as control charts have been found to have early detection properties when monitoring univariate disease counts, these are often inadequate for detecting bio-terrorism events.

There are two principal impediments in statistical process control methods for the detection of bio-terrorism events: firstly, these methods aggregate over space by examining total counts and thus ignore the spatial dimension of the task and secondly they fail to adjust for the usual (seasonal) behaviour of diseases (e.g., Steiner *et al.*, 2011 where the focus is early detection of the start of influenza outbreak). If disease outbreaks were expected to be spread relatively uniformly in space, then the former reason is unimportant. However, since bioterrorism attacks are likely to be introduced to specifically targeted locations, then the resulting disease instances are likely to cluster in space. Consequently, monitoring total counts is likely to reduce the signal-to-noise ratio of this outbreak by aggregating over regions where there is no outbreak. Exploiting the spatial clustering should be able to provide additional power and efficiency in detecting the outbreak.

There is much ongoing work in developing methods that are efficient at detecting outbreaks in a spatial context but these methods may still fail to deal with the latter fault mentioned above. When surveillance for bio-terrorism involves the monitoring of diseases or syndromes already present in the population then the detection of an attack may be delayed if it is introduced during a period of normal seasonal increased activity. For example, respiratory complaints are often much more frequent in the cooler months so detecting an intentional disease outbreak with respiratory symptomatology would need to differentiate between the usual and an unusual increase in cases. Therefore, it is often necessary to remove the influence of the expected behaviour of a disease to detect the signal of an introduced strain early.

Before presenting the method we are proposing in this chapter to deal with both of the above concerns, we begin by outlining some of the existing spatial disease detection methods. The current benchmark in spatio-temporal surveillance is the spatial SCAN statistic (Tango, 1995, Kulldorff, 1995, 1997, 2001, 2005). This method is a spatio-temporal moving average plan that systematically scans the target space applying a test to all windows of data up to a given fixed size in time and space. This presents an intuitive

Spatio-Temporal Disease Surveillance 161

The EWMA Surveillance Tree method proposed here also applies a smoothing in the spatial sense. If smoothing has the advantage of reducing spatial noise and more easily exposes spatial trends in disease outbreaks, then this could translate into earlier detection properties. This chapter will also evaluate when it is beneficial to spatially smooth counts for the early

We will continue by giving a detailed overview of the spatial SCAN statistic. We will then propose the EWMA Surveillance Tree methodology in more detail and discuss the results of some simulation studies comparing this method to the SCAN statistic. Lastly we will briefly discuss the extension of these methods to a broader multivariate context which is the key in

Scan statistics are so described because they arise from the scanning of time and/or space looking for clusters of events. This section will present the details for an implementation of the spatial SCAN statistic that will be used as the benchmark in the simulation studies to follow. The SCAN statistic is similar to a spatio-temporal moving average plan that looks at the number of observations in cylinders of spatio-temporal space (e.g., see Figure 1), where the height of the cylinder is time. The SCAN plan used in this chapter is indistinguishable from the approaches used in Glaz *et al*. (2001) in the spatial dimension but uses a moving window for the temporal dimension. As such the results from Chen and Glaz can be used to get efficient starting estimates for the threshold used in the surveillance plan. For other literature on the SCAN plan see Tango (1995), Kulldorff and Nagarwalla (1995), Kulldorff (1997), and Kulldorff, M. (2001), Kulldorff *et al.* (2005), Woodall et al (2008) and Han *et al.*

Fig. 1. Example of the scanned region – circular area is scanned and the depth is the number

The SCAN plan in this chapter is applied to a fixed lattice structure over the target area of the data for reasons which will become obvious later. Although this lattice structure may not be regular as in Figure 2 below, it is convenient at this stage to make it regular. In other words, in this lattice, individual cells are not taken to have a fixed volume; they are

applications of these methods to syndromic surveillance for emerging diseases.

detection of diseases.

**2. The SCAN statistic** 

of days that counts are aggregated over.

(2008).

approach but has received some criticisms by Woodall *et al.* (2008) and Han *et al.* (2008), including: firstly that it is not as efficient as the cumulative SUM (CUSUM) (see Raubertas, 1989, and Rogerson and Yamada, 2004) for outbreak detection and secondly that its ability to detect outbreaks most effectively is dependent on the choice of shape and size of the scanning window. In addition, in some of the literature on the SCAN statistic, the focus is on whether diseases cluster rather than on early detection.

Some control chart methods have also been proposed such as the Multivariate Exponentially Weighted Moving Average (MEWMA) control chart method proposed by Joner *et al.* (2008). This approach could be extended to the lattice structure of counts discussed below but involving all cell counts in the lattice making it a high dimensional application of the MEWMA plan which is difficult to implement. Exponential weighted moving averages (EWMA) have also proved valuable in the early detection of persistent disease outbreaks (Steiner et al, 2010 and Sparks et al 2010a). However these methods are frequently unable to account for underlying changes that occur in the observation of disease instances through health services such as day-of-the-week or holiday effects. The removal of known trends, called nuisance variables, would reduce the noise in the departures of counts from expected, and as such provides the surveillance plans with a increased chance of early detection.

A bio-terrorist attack is likely to be initiated at several simultaneous disconnected locations causing clustered outbreaks of different size and shape. Having a technology that has the flexibility of finding outbreaks started at several disconnected locations is essential in this setting.

In this chapter, we explore the method EWMA Surveillance Trees that addresses the concerns outlined above and that has this flexibility. The approach taken here is to divide the geographical regions into a lattice structure so that the expected numbers of cases within the cells are as similar as possible. Expected counts are achieved by firstly building a model that describes the usual behaviour of disease counts (including any usual seasonal increases) and then using this model to produce a one day-ahead forecast for the expected counts (see Sparks, *et al.* 2010a). The method proposed then:


So this method should be able to detect outbreaks of varying size and shapes and incorporate the benefits of EWMA smoothing for early detection. Its performance will be measured against an implementation of the SCAN statistic.

approach but has received some criticisms by Woodall *et al.* (2008) and Han *et al.* (2008), including: firstly that it is not as efficient as the cumulative SUM (CUSUM) (see Raubertas, 1989, and Rogerson and Yamada, 2004) for outbreak detection and secondly that its ability to detect outbreaks most effectively is dependent on the choice of shape and size of the scanning window. In addition, in some of the literature on the SCAN statistic, the focus is on

Some control chart methods have also been proposed such as the Multivariate Exponentially Weighted Moving Average (MEWMA) control chart method proposed by Joner *et al.* (2008). This approach could be extended to the lattice structure of counts discussed below but involving all cell counts in the lattice making it a high dimensional application of the MEWMA plan which is difficult to implement. Exponential weighted moving averages (EWMA) have also proved valuable in the early detection of persistent disease outbreaks (Steiner et al, 2010 and Sparks et al 2010a). However these methods are frequently unable to account for underlying changes that occur in the observation of disease instances through health services such as day-of-the-week or holiday effects. The removal of known trends, called nuisance variables, would reduce the noise in the departures of counts from expected, and as such provides the surveillance plans with a increased chance of early

A bio-terrorist attack is likely to be initiated at several simultaneous disconnected locations causing clustered outbreaks of different size and shape. Having a technology that has the flexibility of finding outbreaks started at several disconnected locations is essential in this

In this chapter, we explore the method EWMA Surveillance Trees that addresses the concerns outlined above and that has this flexibility. The approach taken here is to divide the geographical regions into a lattice structure so that the expected numbers of cases within the cells are as similar as possible. Expected counts are achieved by firstly building a model that describes the usual behaviour of disease counts (including any usual seasonal increases) and then using this model to produce a one day-ahead forecast for the expected counts (see

 Builds temporal memory of the disease process by smoothing the time series of cell counts using MEWMAs. This allows the method to build up memory of the process to

 Uses exponentially weighted spatially smoothing of these temporal EWMA cell counts by smoothing across both the rows and columns of the lattice (very similar to

 Detects departures from expected value in the spatio-temporally smoothed lattice counts using a binary recursive partitioning approach that focuses in on specific areas

Prunes these partitions in a way that leaves only the outbreaks, that is areas of

So this method should be able to detect outbreaks of varying size and shapes and incorporate the benefits of EWMA smoothing for early detection. Its performance will be

whether diseases cluster rather than on early detection.

Sparks, *et al.* 2010a). The method proposed then:

detect slowly evolving changes.

significant departure from expected.

measured against an implementation of the SCAN statistic.

MEWMA).

of concern.

detection.

setting.

The EWMA Surveillance Tree method proposed here also applies a smoothing in the spatial sense. If smoothing has the advantage of reducing spatial noise and more easily exposes spatial trends in disease outbreaks, then this could translate into earlier detection properties. This chapter will also evaluate when it is beneficial to spatially smooth counts for the early detection of diseases.

We will continue by giving a detailed overview of the spatial SCAN statistic. We will then propose the EWMA Surveillance Tree methodology in more detail and discuss the results of some simulation studies comparing this method to the SCAN statistic. Lastly we will briefly discuss the extension of these methods to a broader multivariate context which is the key in applications of these methods to syndromic surveillance for emerging diseases.

### **2. The SCAN statistic**

Scan statistics are so described because they arise from the scanning of time and/or space looking for clusters of events. This section will present the details for an implementation of the spatial SCAN statistic that will be used as the benchmark in the simulation studies to follow. The SCAN statistic is similar to a spatio-temporal moving average plan that looks at the number of observations in cylinders of spatio-temporal space (e.g., see Figure 1), where the height of the cylinder is time. The SCAN plan used in this chapter is indistinguishable from the approaches used in Glaz *et al*. (2001) in the spatial dimension but uses a moving window for the temporal dimension. As such the results from Chen and Glaz can be used to get efficient starting estimates for the threshold used in the surveillance plan. For other literature on the SCAN plan see Tango (1995), Kulldorff and Nagarwalla (1995), Kulldorff (1997), and Kulldorff, M. (2001), Kulldorff *et al.* (2005), Woodall et al (2008) and Han *et al.* (2008).

Fig. 1. Example of the scanned region – circular area is scanned and the depth is the number of days that counts are aggregated over.

The SCAN plan in this chapter is applied to a fixed lattice structure over the target area of the data for reasons which will become obvious later. Although this lattice structure may not be regular as in Figure 2 below, it is convenient at this stage to make it regular. In other words, in this lattice, individual cells are not taken to have a fixed volume; they are

Spatio-Temporal Disease Surveillance 163

bio-terrorism outbreak is known in advance then it is best to use these dimensions precisely in designing the scanning window, unless the outbreak is sparse within the region. In this case a smaller window is needed (see Sparks, 2011c) to exploit this sparsity. Unfortunately, we seldom know the dimension of bio-terrorism outbreaks, and therefore we need methodology that is flexible enough for detecting unknown outbreaks quickly. The next section discusses such flexible technology that is not as dependent on the shape

The EWMA Surveillance Tree method proposed here for comparison against the SCAN plan uses an EWMA approach by smoothing counts first temporally to build in temporal memory of the process, and then spatially to average counts locally in space. Smoothing can also be viewed as an approach that reduces the noise in counts without reducing the signal too much. This makes the signal-to-noise ratio larger and therefore has early detection advantages. There are many examples of EWMA temporal smoothing of counts data aimed at early detection of diseases, for example see Sparks *et al.* (2010a) and Stiener *et al.* (2010) for Poisson counts, and Sparks *et al.* (2010b) and Sarka (2011) for negative binomial counts. There is no literature on spatial exponentially weighted moving average smoothing although there is some work by Rogerson et al (2006) using 2 dimensional kernel smoothers. The smoothed counts are then used in an outbreak

The EWMA smoothing process has computational advantages because only the last temporally smoothed cell counts are retained rather than the last 10 days as is necessary in the SCAN plan discussed in the previous section. Areas are searched for outbreaks by comparing their smoothed observed counts to their respective smoothed expected counts. An outbreak is signalled when the difference between observed and expected counts in a window exceeds a threshold. The value of the threshold is chosen to control the false alarm rate. Note that EWMA smoothed counts are no longer Poisson distributed, however EWMA smoothing (spatial and temporal) helps improve the approximation to a normal distribution

1. EWMA based smoothing of observed and expected counts, first temporally then

2. Growing a surveillance tree of departures from expected value in the spatio-temporally

Define 40 by 40 matrices of counts and expected values for day *t* by ( , ,) { } *Y y t kt* and

respectively. Temporal smoothing examines the exponentially weighted

3. Pruning the surveillance tree to reveal outbreaks and control the false alarm rate.

smoothed counts using a binary recursive partitioning approach

Each of these steps is detailed in subsections 3.1, 3.2 and 3.3 respectively.

and size of the outbreak.

detection procedure.

of the smoothed counts.

spatially;

( , ,) { } *t kt* 

The method consists of three major steps:

**3.1 EWMA temporal and spatial smoothing** 

moving average (EWMA) to the counts according to

**3. EWMA surveillance trees** 

constructed so that the marginal total across each row/column spatial dimension has approximately the same expected number of counts. In addition the horizontal and vertical dividers of cells may be curved giving the design greater flexibility.

The lattice structure reduces the amount of multiple testing by forcing the scanned area to always include complete spatial ("square") cells. However, if a bio-terrorism event spans parts of some cells then some information about its signal is lost. The time window is arbitrarily taken as 10 days in this chapter, but could be taken as 7 days or 14 days to do away with the need for accounting for the influence of day-of-the-week effects.

The SCAN plan counts the number of disease cases within the scanned space-time window and compares this count to its expected count, assuming a Poisson distribution. Often autocorrelation of counts is ignored in this process which does cause inefficiencies in the inference of detecting bio-terrorism events. In this chapter we will also ignore the autocorrelation in the temporal counts. Since the SCAN plan involves significant amounts of multiple testing, we need to establish the false alarm rates for the level of multiple testing.

Fig. 2. The target region for assessing outbreaks is 20 by 20 cells by time and the scanned area is all 5 by 5 cells by time volumes - one example of which is illustrated in grey.

In this paper, we use a 40 by 40 cell target region. We examined two levels of spatial memory in the SCAN plans applied later in the chapter. The spatial cells scanned are either 5 by 5 or 10 by 10 both with a moving time window of 10 days. The 5 by 5 cell scan involves 36x36=1296 different (but overlapping) scanned regions, and the 10 by 10 cell scan involves 961 different (but overlapping) scanned regions. Clearly, the 10 by 10 cell scan involves fewer tests and so its level of significance for the fixed false alarm rate is larger than the 5 by 5 cell scan. This means that it should detect an outbreak close to 10 by 10 in dimensions earlier than the 5 by 5 cell scan. Therefore if the dimension of a future

constructed so that the marginal total across each row/column spatial dimension has approximately the same expected number of counts. In addition the horizontal and vertical

The lattice structure reduces the amount of multiple testing by forcing the scanned area to always include complete spatial ("square") cells. However, if a bio-terrorism event spans parts of some cells then some information about its signal is lost. The time window is arbitrarily taken as 10 days in this chapter, but could be taken as 7 days or 14 days to do

The SCAN plan counts the number of disease cases within the scanned space-time window and compares this count to its expected count, assuming a Poisson distribution. Often autocorrelation of counts is ignored in this process which does cause inefficiencies in the inference of detecting bio-terrorism events. In this chapter we will also ignore the autocorrelation in the temporal counts. Since the SCAN plan involves significant amounts of multiple testing, we need to establish the false alarm rates for the level of multiple

Fig. 2. The target region for assessing outbreaks is 20 by 20 cells by time and the scanned area is all 5 by 5 cells by time volumes - one example of which is illustrated in grey.

In this paper, we use a 40 by 40 cell target region. We examined two levels of spatial memory in the SCAN plans applied later in the chapter. The spatial cells scanned are either 5 by 5 or 10 by 10 both with a moving time window of 10 days. The 5 by 5 cell scan involves 36x36=1296 different (but overlapping) scanned regions, and the 10 by 10 cell scan involves 961 different (but overlapping) scanned regions. Clearly, the 10 by 10 cell scan involves fewer tests and so its level of significance for the fixed false alarm rate is larger than the 5 by 5 cell scan. This means that it should detect an outbreak close to 10 by 10 in dimensions earlier than the 5 by 5 cell scan. Therefore if the dimension of a future

dividers of cells may be curved giving the design greater flexibility.

testing.

away with the need for accounting for the influence of day-of-the-week effects.

bio-terrorism outbreak is known in advance then it is best to use these dimensions precisely in designing the scanning window, unless the outbreak is sparse within the region. In this case a smaller window is needed (see Sparks, 2011c) to exploit this sparsity. Unfortunately, we seldom know the dimension of bio-terrorism outbreaks, and therefore we need methodology that is flexible enough for detecting unknown outbreaks quickly. The next section discusses such flexible technology that is not as dependent on the shape and size of the outbreak.

#### **3. EWMA surveillance trees**

The EWMA Surveillance Tree method proposed here for comparison against the SCAN plan uses an EWMA approach by smoothing counts first temporally to build in temporal memory of the process, and then spatially to average counts locally in space. Smoothing can also be viewed as an approach that reduces the noise in counts without reducing the signal too much. This makes the signal-to-noise ratio larger and therefore has early detection advantages. There are many examples of EWMA temporal smoothing of counts data aimed at early detection of diseases, for example see Sparks *et al.* (2010a) and Stiener *et al.* (2010) for Poisson counts, and Sparks *et al.* (2010b) and Sarka (2011) for negative binomial counts. There is no literature on spatial exponentially weighted moving average smoothing although there is some work by Rogerson et al (2006) using 2 dimensional kernel smoothers. The smoothed counts are then used in an outbreak detection procedure.

The EWMA smoothing process has computational advantages because only the last temporally smoothed cell counts are retained rather than the last 10 days as is necessary in the SCAN plan discussed in the previous section. Areas are searched for outbreaks by comparing their smoothed observed counts to their respective smoothed expected counts. An outbreak is signalled when the difference between observed and expected counts in a window exceeds a threshold. The value of the threshold is chosen to control the false alarm rate. Note that EWMA smoothed counts are no longer Poisson distributed, however EWMA smoothing (spatial and temporal) helps improve the approximation to a normal distribution of the smoothed counts.

The method consists of three major steps:


Each of these steps is detailed in subsections 3.1, 3.2 and 3.3 respectively.

#### **3.1 EWMA temporal and spatial smoothing**

Define 40 by 40 matrices of counts and expected values for day *t* by ( , ,) { } *Y y t kt* and ( , ,) { } *t kt* respectively. Temporal smoothing examines the exponentially weighted moving average (EWMA) to the counts according to

defined)

40 40

40 40 ~. . 1 1 *j j j j c Y*

3. **Select the best partition:** Partition on

generated by the following steps:

~ .. 1 1 *p p i i*

*q q j j j j j j c Y*

3. **Select the best partition**: Partition on:

either

` .. 1 1 *ki i i k i k r Y* for 1,2,...,39. *<sup>k</sup>*

max( , ) max( , ) *kk ii* ~ ~ *r r rr* for all *i k* .

max( , ) max( , ) ~ ~ *j j c c cc* for all *j* .

~1 ~ *z c* ; <sup>1</sup> *<sup>k</sup> z r* then ~1 *<sup>k</sup> z r* ; and if 1 *<sup>k</sup> z r* then ~1 *<sup>k</sup> z r* .

*ki i i k i k r Y* for 1 1 , 1,..., 1. *<sup>p</sup> k ii i*

2. **Longitudinal partitions:** Calculate

other offspring have signal-to-noise ratio <sup>~</sup>*<sup>g</sup> z* .

All cells in the parent space are less than expected.

. . 1 1

Spatio-Temporal Disease Surveillance 165

node the parent space is *P kk* { : 1,2,...,40} for both *i* and *j* ( *<sup>i</sup>*. and .*<sup>j</sup>* are similarly

*k k ki i i i r Y* and

*k k ki i i i i i r Y* and

1 1 . . *j j j j j j c Y*

or

or

1. **Latitudinal (row oriented) partitions:** Calculate . . 1 1

2. **Longitudinal (column oriented) partitions:** Calculate . . 1 1 *j j j j c Y*

a. Row 1 to *k* and *k*+1 to 40 if max( , ) max( , ) *k k* ~ ~ *rr cc* for all and

b. Column 1 to and +1 to 40 if max( , ) max( , ) ~ ~ *k k cc rr* for all *k* and

4. Define 1 ~~ max( , , , ) *k k z cc rr* as partition score for the right-hand partition and let the other offspring have signal-to-noise ratio ~1 *z* . If 1 ~ *z c* then ~1 *z c* ; if 1*z c* then

Now we repeat the process for the next generation of offspring by considering each offspring of the partition above as a parent space made up of *p* rows and *q* columns defined by rows 1 1 , 1,..., *<sup>p</sup> ii i* and columns 1 1 , 1,..., *<sup>q</sup> jj j* . Then a further two offspring are

a. Row 1*i* to *k* and *k*+1 to *pi* if max( , ) max( , ) *k k* ~ ~ *rr cc* for all and

b. Column 1*j* to and +1 to *qj* if max( , ) max( , ) ~ ~ *k k cc rr* for all *k* and

4. Define *<sup>g</sup>* max( , , , ) ~ ~ *k k z cc rr* as partition score for the right-hand partition and let the

The process is repeated for each new generation and only stops when the smooth counts are

Too small to alarm an unusually high count (this is discussed in more detail later).

1. **Latitudinal partitions:** Calculate 1 1 . .

for 1 1 , 1,..., 1. *<sup>q</sup> jj j*

max( , ) max( , ) ~ ~ *k ii rr rr* for all 1 1 ( , 1,..., ) *<sup>p</sup> i k ii i* .

max( , ) max( , ) ~ ~ *j j cc cc* for all 1 1 ( , 1,..., ) *<sup>q</sup> j jj j* .

for 1,2,...,39.

$$
\overline{Y}\_t = \alpha Y\_t + (1 - \alpha)\overline{Y}\_{t-1}
$$

(*Y*0 1 ) and similarly smooths the matrix of expected values of these counts using

$$
\overline{\Lambda}\_t = a \Lambda\_t + (1 - a) \overline{\Lambda}\_{t-1}
$$

where 0 1 , and 0<<1 is a constant that determines how much memory to retain in the averages. Larger values of retain less temporal memory.

Once the counts have been smoothed temporally, we then smooth over space to average counts locally. So the temporal smoothed counts and temporal smoothed expected values of *Yt* and *t* respectively are now spatially smoothed using smoothing matrix defined by | | 40 | | <sup>1</sup> { } {(1 ) / (1 ) } *<sup>i</sup> <sup>j</sup> <sup>i</sup> <sup>j</sup> ij <sup>j</sup> A* . The spatial smoothing of counts is defined by

$$
\tilde{Y}\_t = A \, \overline{Y}\_t A^t
$$

and the spatial smoothing of their expected values is defined by

$$
\tilde{\Lambda}\_t = A \overline{\Lambda}\_t A^t
$$

This EWMA spatial smoothing has the advantage over other smoothers of computational simplicity and is efficient in terms of early outbreak detection. Alternative spatial smoothing involving two dimensional kernel smoothers (Riedel, 1993, Rogerson *et al.* 2006) or splines (Currie *et al.* 2003, Lee and Durban, 2009) are more complex computationally. The EWMA smoother is similar to a kernel smoother using a double exponential distribution. The main reason for this choice is computational speed. We ignore the boundary problems with this EWMA smoothing approach. It is clear that the spatial smoothing employed here smooths boundary cell counts less. In addition, the recursive partitioning approach does not adjust for the differential smoothing. Therefore the surveillance trees detect outbreaks that hug a boundary very quickly. For this reason no clusters are considered that hug boundaries in the simulation section, because this would exaggerate the performance of surveillance trees.

#### **3.2 Growing the surveillance trees**

The recursive partitioning approach applied in this paper is very similar to that in Yu (2009) and identical to Sparks and Okugami (2010b). Surveillance trees generate offspring by recursively partitioning either longitudinal cells or latitudinal cells into rectangular regions. We start by finding the best partition of longitudinal cells in terms of having the highest signal to noise ratio, and then the best partition of latitudinal cells. Thereafter, select the partition with the highest signal-to-noise ratio between the best longitudinal cell partition and the best latitudinal cell partition.

The choice of the signal-to-noise ratio as the measure of departure from expected value was decided based on the fact that the square root of the smooth counts has roughly a constant variance and its mean is equal to the square root of the smoothed mean.

The mathematical summary of this process for the first generation of offspring is as follows:

denote *i i* . *<sup>j</sup> j P Y Y* and .*<sup>j</sup> ij i P Y Y* where *P* defines the parent space, e.g., at the root

<sup>1</sup> (1 ) *YY Y tt t*

<sup>1</sup> (1 ) *tt t*

where 0 1 , and 0<<1 is a constant that determines how much memory to retain in the

Once the counts have been smoothed temporally, we then smooth over space to average counts locally. So the temporal smoothed counts and temporal smoothed expected values of *Yt* and *t* respectively are now spatially smoothed using smoothing matrix defined by

*<sup>t</sup> Y AY A t t*

*<sup>t</sup> t t <sup>A</sup> <sup>A</sup>* .

This EWMA spatial smoothing has the advantage over other smoothers of computational simplicity and is efficient in terms of early outbreak detection. Alternative spatial smoothing involving two dimensional kernel smoothers (Riedel, 1993, Rogerson *et al.* 2006) or splines (Currie *et al.* 2003, Lee and Durban, 2009) are more complex computationally. The EWMA smoother is similar to a kernel smoother using a double exponential distribution. The main reason for this choice is computational speed. We ignore the boundary problems with this EWMA smoothing approach. It is clear that the spatial smoothing employed here smooths boundary cell counts less. In addition, the recursive partitioning approach does not adjust for the differential smoothing. Therefore the surveillance trees detect outbreaks that hug a boundary very quickly. For this reason no clusters are considered that hug boundaries in the simulation section, because this would exaggerate the performance of surveillance trees.

The recursive partitioning approach applied in this paper is very similar to that in Yu (2009) and identical to Sparks and Okugami (2010b). Surveillance trees generate offspring by recursively partitioning either longitudinal cells or latitudinal cells into rectangular regions. We start by finding the best partition of longitudinal cells in terms of having the highest signal to noise ratio, and then the best partition of latitudinal cells. Thereafter, select the partition with the highest signal-to-noise ratio between the best longitudinal cell partition

The choice of the signal-to-noise ratio as the measure of departure from expected value was decided based on the fact that the square root of the smooth counts has roughly a constant

The mathematical summary of this process for the first generation of offspring is as follows:

where *P* defines the parent space, e.g., at the root

variance and its mean is equal to the square root of the smoothed mean.

. The spatial smoothing of counts is defined by

 

 

(*Y*0 1 ) and similarly smooths the matrix of expected values of these counts using

averages. Larger values of retain less temporal memory.

 

and the spatial smoothing of their expected values is defined by


<sup>1</sup> { } {(1 ) / (1 ) } *<sup>i</sup> <sup>j</sup> <sup>i</sup> <sup>j</sup>*

**3.2 Growing the surveillance trees** 

and the best latitudinal cell partition.

and .*<sup>j</sup> ij i P Y Y*

denote *i i* . *<sup>j</sup> j P Y Y*

*ij <sup>j</sup> A* 

node the parent space is *P kk* { : 1,2,...,40} for both *i* and *j* ( *<sup>i</sup>*. and .*<sup>j</sup>* are similarly defined)


$$\mathcal{C}\_{\sim \ell} = \sqrt{\sum\_{j=\ell+1}^{40} \tilde{Y}\_{\cdot j}} - \sqrt{\sum\_{j=\ell+1}^{40} \tilde{\Lambda}\_{\cdot j}} \text{ for } \ell = 1, 2, ..., 39.$$

	- a. Row 1 to *k* and *k*+1 to 40 if max( , ) max( , ) *k k* ~ ~ *rr cc* for all and max( , ) max( , ) *kk ii* ~ ~ *r r rr* for all *i k* .
	- b. Column 1 to and +1 to 40 if max( , ) max( , ) ~ ~ *k k cc rr* for all *k* and max( , ) max( , ) ~ ~ *j j c c cc* for all *j* .

Now we repeat the process for the next generation of offspring by considering each offspring of the partition above as a parent space made up of *p* rows and *q* columns defined by rows 1 1 , 1,..., *<sup>p</sup> ii i* and columns 1 1 , 1,..., *<sup>q</sup> jj j* . Then a further two offspring are generated by the following steps:

	- a. Row 1*i* to *k* and *k*+1 to *pi* if max( , ) max( , ) *k k* ~ ~ *rr cc* for all and max( , ) max( , ) ~ ~ *k ii rr rr* for all 1 1 ( , 1,..., ) *<sup>p</sup> i k ii i* .
	- b. Column 1*j* to and +1 to *qj* if max( , ) max( , ) ~ ~ *k k cc rr* for all *k* and max( , ) max( , ) ~ ~ *j j cc cc* for all 1 1 ( , 1,..., ) *<sup>q</sup> j jj j* .

The process is repeated for each new generation and only stops when the smooth counts are either


Spatio-Temporal Disease Surveillance 167

control counts is presented in Figure 3. Latitude (La) cells are numbered from the bottom of row as 1 to the top row as 20. Longitude (Lo) cells are numbered 1 (left) to 20 (right). Correlated cell counts are noted from Figure 3 by higher than expected cell counts (greater than 2,5) clumping together even when in-control, e.g., latitude number rows 18, 19 and 20 (La>17) and longitude columns 7, 8, 9 and 10 (6<Lo<11). This makes it more difficult to

Fig. 5. Presenting offspring for 5 generations using recursive partitioning

We use transitional Poisson regression models to forecast the expected counts one day ahead. We assume that these forecasts, when no bio-terrorism event is present, produce unbiased estimates of the cell means. Therefore, in our application, the in-control count departures from the forecasts are assumed uncorrelated because the dependence is removed by the model forecasts, and the forecast errors are assumed spatially uncorrelated when incontrol. The assumption here is that all the spatial correlation is in the mean of the Poisson counts, and the day ahead forecast is an unbiased estimate of this mean. As such, the differences between functions of counts and forecasts will be assumed to be uncorrelated.

identify bio-terrorism events.


The best longitudinal partition and best latitudinal partition are both the parent space.

Fig. 3. An example of in-control counts when cell means are correlated.


Fig. 4. An example of a hypothetical bio-terrorism event.

An example, of the recursive partitioning when there is no smoothing of counts is now used to demonstrate the growing process for a 20 20 grid. Assume that the in-control Poisson counts for each cell in the grid (except boundary cells) have a mean of 2,5. An example of in-

The best longitudinal partition and best latitudinal partition are both the parent space.

Fig. 3. An example of in-control counts when cell means are correlated.

Fig. 4. An example of a hypothetical bio-terrorism event.

An example, of the recursive partitioning when there is no smoothing of counts is now used to demonstrate the growing process for a 20 20 grid. Assume that the in-control Poisson counts for each cell in the grid (except boundary cells) have a mean of 2,5. An example of incontrol counts is presented in Figure 3. Latitude (La) cells are numbered from the bottom of row as 1 to the top row as 20. Longitude (Lo) cells are numbered 1 (left) to 20 (right). Correlated cell counts are noted from Figure 3 by higher than expected cell counts (greater than 2,5) clumping together even when in-control, e.g., latitude number rows 18, 19 and 20 (La>17) and longitude columns 7, 8, 9 and 10 (6<Lo<11). This makes it more difficult to identify bio-terrorism events.

Fig. 5. Presenting offspring for 5 generations using recursive partitioning

We use transitional Poisson regression models to forecast the expected counts one day ahead. We assume that these forecasts, when no bio-terrorism event is present, produce unbiased estimates of the cell means. Therefore, in our application, the in-control count departures from the forecasts are assumed uncorrelated because the dependence is removed by the model forecasts, and the forecast errors are assumed spatially uncorrelated when incontrol. The assumption here is that all the spatial correlation is in the mean of the Poisson counts, and the day ahead forecast is an unbiased estimate of this mean. As such, the differences between functions of counts and forecasts will be assumed to be uncorrelated.

Spatio-Temporal Disease Surveillance 169

Figure 5 revealed that this gave the highest z-score of target region. Recursive partitioning is like stepwise variable selection when building parsimonious regression models – it does not always produce the same result as an exhaustive search. In this example, we have demonstrated that recursive partitioning found the largest outbreak area using significantly smaller number of computations than for example the SCAN statistic with multiple size windows. This balance between computational and detection efficiency is the most attractive feature of the recursive partitioning approach besides its ability to

Figure 6 is a tree based representation of the recursive partitions made in Figure 5. Note that the terminal node defined by rules La>15, Lo<10 and Lo>7 does not have any offspring. The *<sup>g</sup> z* for this node is ( 36 25) 1 where 36 3 5 3 4 3 3 4 4 4 3 and 25 2,5 10 . No partition of this space produces a higher z-score than the parent node's zscore of 1. The closest to this is ( 5 4 3 4 3 5 2,5 0,825 . Ideally we would like to stop recursive partitioning at this node. In practice, the decision to stop unnecessarily growing of the tree is difficult. Generally we would keep partitioning until some stopping

Once partitioning has stopped, then recursive pruning of the terminal nodes commences.

The aim of pruning is to recursively trim away all terminal nodes with smoothed counts that do not differ significantly from their expected values. If all nodes in the tree are pruned away to leave no tree, then no outbreak is signalled. However, if terminal nodes remain after pruning, then an alarm is given. The geographical location of the outbreak is diagnosed by the set of partitioning rules for the remaining terminal nodes. This could involve several

Pruning only starts after growing the tree has stopped. The pruning process is very simple. We prune terminal nodes recursively starting with the last generation offspring in the tree, and then repeat the process recursively with each remaining generation. Only branches with significantly higher than expected counts survive pruning. We prune the least significant offspring if ~*<sup>g</sup> <sup>z</sup> z h* and prune and the generation if *<sup>g</sup> <sup>z</sup> z h* where *<sup>z</sup> h* is a positive constant designed to deliver a specified false alarm. Pruning is used to control the false alarm rate measured by either specifying an in-control Average Run Length (ARL) or an in-control Recurrence Interval (RI). The Recurrence Interval (RI) is used to indicate the steady state false alarm rate. ARL and RI are defined more clearly in the simulation section (see Section 4). Pruning starts with the highest values of *g* (generations) and stops when either all generations are pruned or when at least one generation (terminal node) survives

The pruning process leaves only generations with smoothed counts significantly higher than expected, e.g., if we started pruning from generation 4 in Figure 5 then only two offspring containing the outbreak regions plus their parents, grandparents, etc would survive the pruning process. If pruning was applied to a tree for the data in Figure 3 with

**3.3 Pruning the surveillance tree to deliver a particular false alarm rate** 

adapt with size and shape of the outbreak.

The pruning process is outlined in the next section.

rule applies.

non-overlapping locations.

the pruning process.

Even if this is not true; provided the bootstrap population used to estimate the threshold is accurate, the simulation process will produce an unbiased estimate of the threshold (see Sparks et al. 2010b).

Figure 4 presents another representation of counts illustrated in Figure 3, except in Figure 4 a bio-terrorism event is simulated by adding additional (out-of-control) counts at Latitudes 16, 17, 18 and 19 (15 20) *La* and longitude 10, 11,…, 19 (10 20) *Lo* . The additional counts were assumed to have a mean of 2 per cell. Red is used to highlight the simulated bioterrorism outbreak region in Figure 4. It is clear that the counts in red are higher than the counts for any other part of the 20 20 target region. We now demonstrate the recursive partitioning process in trying to find this outbreak.

The recursive partitioning starts by searching for the best partition of longitude and latitude in terms of maximizing the departure of the region's total counts from their expected value, i.e., maximising *kr* and *c* values, respectively. These are defined by rules La>15 and Lo>9, respectively. Generally the region with the greatest fraction of red will be the one with the highest departure from expected and this is true in this case. The best first partition of La>15 is represented by the black line in the top left-hand table in Figure 5. This first generation of offspring become parents for the next generation of offspring. Each region now grows two offspring by finding the partition that is best for each parent. For the parent region with La>15, it is clear that the best partition is Lo>9 (see the black line in top right-hand table of Figure 5 that separates the red counts from the black counts as best as possible). For the region with La<16, the best partition is less obvious – it turns out to be Lo>17. The bottom left-hand table in Figure 5 gives the next generation of offspring, and the bottom right-hand table in Figure 5 gives the generation that completely specifies the simulated outbreak. The z-score for the red region is g z 10,6 which is higher than any other partitioned region in the table of Figure 5. An exhaustive search of all regions of all different shapes and sizes in

Fig. 6. Surveillance tree representation of the recursive partitions in Figure 5.

Even if this is not true; provided the bootstrap population used to estimate the threshold is accurate, the simulation process will produce an unbiased estimate of the threshold (see

Figure 4 presents another representation of counts illustrated in Figure 3, except in Figure 4 a bio-terrorism event is simulated by adding additional (out-of-control) counts at Latitudes 16, 17, 18 and 19 (15 20) *La* and longitude 10, 11,…, 19 (10 20) *Lo* . The additional counts were assumed to have a mean of 2 per cell. Red is used to highlight the simulated bioterrorism outbreak region in Figure 4. It is clear that the counts in red are higher than the counts for any other part of the 20 20 target region. We now demonstrate the recursive

The recursive partitioning starts by searching for the best partition of longitude and latitude in terms of maximizing the departure of the region's total counts from their expected value, i.e., maximising *kr* and *c* values, respectively. These are defined by rules La>15 and Lo>9, respectively. Generally the region with the greatest fraction of red will be the one with the highest departure from expected and this is true in this case. The best first partition of La>15 is represented by the black line in the top left-hand table in Figure 5. This first generation of offspring become parents for the next generation of offspring. Each region now grows two offspring by finding the partition that is best for each parent. For the parent region with La>15, it is clear that the best partition is Lo>9 (see the black line in top right-hand table of Figure 5 that separates the red counts from the black counts as best as possible). For the region with La<16, the best partition is less obvious – it turns out to be Lo>17. The bottom left-hand table in Figure 5 gives the next generation of offspring, and the bottom right-hand table in Figure 5 gives the generation that completely specifies the simulated outbreak. The z-score for the red region is g z 10,6 which is higher than any other partitioned region in the table of Figure 5. An exhaustive search of all regions of all different shapes and sizes in

Fig. 6. Surveillance tree representation of the recursive partitions in Figure 5.

Sparks et al. 2010b).

partitioning process in trying to find this outbreak.

Figure 5 revealed that this gave the highest z-score of target region. Recursive partitioning is like stepwise variable selection when building parsimonious regression models – it does not always produce the same result as an exhaustive search. In this example, we have demonstrated that recursive partitioning found the largest outbreak area using significantly smaller number of computations than for example the SCAN statistic with multiple size windows. This balance between computational and detection efficiency is the most attractive feature of the recursive partitioning approach besides its ability to adapt with size and shape of the outbreak.

Figure 6 is a tree based representation of the recursive partitions made in Figure 5. Note that the terminal node defined by rules La>15, Lo<10 and Lo>7 does not have any offspring. The *<sup>g</sup> z* for this node is ( 36 25) 1 where 36 3 5 3 4 3 3 4 4 4 3 and 25 2,5 10 . No partition of this space produces a higher z-score than the parent node's zscore of 1. The closest to this is ( 5 4 3 4 3 5 2,5 0,825 . Ideally we would like to stop recursive partitioning at this node. In practice, the decision to stop unnecessarily growing of the tree is difficult. Generally we would keep partitioning until some stopping rule applies.

Once partitioning has stopped, then recursive pruning of the terminal nodes commences. The pruning process is outlined in the next section.

#### **3.3 Pruning the surveillance tree to deliver a particular false alarm rate**

The aim of pruning is to recursively trim away all terminal nodes with smoothed counts that do not differ significantly from their expected values. If all nodes in the tree are pruned away to leave no tree, then no outbreak is signalled. However, if terminal nodes remain after pruning, then an alarm is given. The geographical location of the outbreak is diagnosed by the set of partitioning rules for the remaining terminal nodes. This could involve several non-overlapping locations.

Pruning only starts after growing the tree has stopped. The pruning process is very simple. We prune terminal nodes recursively starting with the last generation offspring in the tree, and then repeat the process recursively with each remaining generation. Only branches with significantly higher than expected counts survive pruning. We prune the least significant offspring if ~*<sup>g</sup> <sup>z</sup> z h* and prune and the generation if *<sup>g</sup> <sup>z</sup> z h* where *<sup>z</sup> h* is a positive constant designed to deliver a specified false alarm. Pruning is used to control the false alarm rate measured by either specifying an in-control Average Run Length (ARL) or an in-control Recurrence Interval (RI). The Recurrence Interval (RI) is used to indicate the steady state false alarm rate. ARL and RI are defined more clearly in the simulation section (see Section 4). Pruning starts with the highest values of *g* (generations) and stops when either all generations are pruned or when at least one generation (terminal node) survives the pruning process.

The pruning process leaves only generations with smoothed counts significantly higher than expected, e.g., if we started pruning from generation 4 in Figure 5 then only two offspring containing the outbreak regions plus their parents, grandparents, etc would survive the pruning process. If pruning was applied to a tree for the data in Figure 3 with

Spatio-Temporal Disease Surveillance 171

but random in space then these changes are expected to be ironed out by the spatial smoothing. Adaptive plans can be derived using the results in Table 1 based on Sparks

Average counts 0,7 0,8 0,9 1,0 1,1 1,27

1,15 1,2 1,25

73,7 100,2 111,4

52,3 68,7 86,5

This section provides the results of simulation studies comparing the performance of the SCAN statistic with that of the EWMA Surveillance Trees for two outbreak shapes and locations using Average Time to Signal and Average Run Length as the measures of performance. When the size and shape of the scanning window matches that of the outbreak in the SCAN plan, then it remains optimal. However, the flexible partitions of EWMA Surveillance Trees result in robust performance across a range of outbreak dimensions,

We simulate uncorrelated Poisson counts on a 40x40 spatial grid using the R function in Appendix A as described in Section 3.4, with the exception that the mean value is bounded at a minimum of one (i.e., k=0) which makes the counts more homogeneous. Denote these counts by *<sup>i</sup>*, *<sup>j</sup> c* for row *i* and column *j*. These counts are then aggregated locally as follows

, i,j i-1,j i,j-1 i 1,j i,j 1 i-1,j-1 i-1,j 1 i 1,j 1 i 1,j 1 floor(c +0,3 (c +c +c +c )+0,1 (c +c +c +c ) ) *i j c*

where floor means round down to the nearest whole number. Boundary cells have smaller

This generates more realistic spatially correlated in-control counts for infectious or contagious diseases. Note that these counts are not expected to be Poisson distributed, but tend to be under-dispersed counts relative to the Poisson distribution. We assume no knowledge of this under-dispersion in trying to forecast counts. We model the trends in the counts using transitional Poisson regression model and assume that if the counts are incontrol then the one day ahead forecast errors are uncorrelated. In situations where spatial correlation persists in the forecast errors for in-control situations, an alternative approach is models which account for this spatial correlation, such as seeming unrelated Poisson regression models (King, 1989, and Grijalva, T., Bohara, *et al.* 2003), and multivariate Poisson

1,15 1,2 1,25

78,7 106,4 119,2

56,1 72,4 88,9 1,15 1,2 1,25

83,1 109,7 126,2

60,1 68,7 86,5 1,15 1,2 1,25

88,7 117,9 149,3

69,3 82,5 97,4

1,15 1,2 1,25

62,5 84,4 91,8

51,6 63,9 83,6

Table 1. In-control ARLs and RIs for specified thresholds of 1,15, 1,2 and 1,25

(2000) and Sparks et al (2010a).

1,15 1,2 1,25

55,4 72,4 81,1

50,3 61,0 75,0

without being optimal for any specific outbreak size or shape.

means and their counts are defined as in Table 2.

*z h*

In-control ARL

In-control RI

**4. Simulation study** 

for non-boundary cells:

no outbreaks, then each generation in turn would be pruned away until no offspring remain. Even the target region (the original parent) would be pruned indicating no significant outbreak.

#### **3.4 Using the example to demonstrate the stopping rules for the tree growing process**

It is useful to stop partitioning when it is known that the threshold will not be exceeded with further partitions. Assume that the threshold is 4,45 *<sup>z</sup> h* then the node defined by rules La>15, Lo<10 and Lo>7 should be terminal because ( 36 2.5) 4,4 . No partition of this parent space is going to be greater than the threshold of 4,45. Two other nodes would terminate in generation three Figure 5 using 4,45 *<sup>z</sup> h* . These would be:

a. Lo>9 and La>19 where

$$\left(\sqrt{4+3+2+2+3+2+2+1+0+0+1}-\sqrt{2,5}\right) = 2, 9 < 4, 45 \dots$$

b. La<3 and Lo>17 where

$$\sqrt{(\sqrt{3+2+1+4+2+1}-\sqrt{2,5})} = 2, 0 < 4, 45 \dots$$

Such a stopping rule stops the unnecessary growing of the surveillance tree, because all offspring will be pruned later. Generally this rule can be defined as stopping when the square root of the total of cell counts for a region is smaller than the threshold plus square root of the smallest cell mean in the region. If the threshold is quite low, then having a threshold on the lower limit for the smoothed count is sensible. Assume that our threshold turns out to be 1,15 *<sup>z</sup> h* then the above stopping rule does not help much in reducing the partitioning complexity. In the above example, it is sometimes helpful to specify when you would fail to respond, e.g., no response is made for any outbreak with counts less than 5 above expected per day. We would then stop if smoothed counts were below 7,5.

#### **3.5 Lattice with cell means that are spatially non-homogeneous**

Until now, we have assumed that all cells follow the same model. What if the same time trends persist across cells, e.g., the seasonal trend remains the same for each cell but the mean counts change from cell to cell? How does this influence the threshold for when the model is not homogeneous across all cells? Here we examined the in-control ARL and RI properties as we move the cell mean rate from an average of 0,7 to the current value used of 1,2 by examining its in-control ARL and RI using the same threshold of *<sup>z</sup> h* 1,15. The results are recorded in the Table 1 for counts generated using the R function in Appendix A with overall mean mu=0,1, day-of-the-week influences given by bweekday=( -0,025, -0,025, - 0,025, -0,025, -0,025,0,05,0,075), seasonal influences given by bcos=0,1, bsin=-0,1, transitional influences of blags=(0,1,0,06,0,02,0,01), and k=-4.

This plan is also reasonably robust to changes in spatial non-homogeneity in means whereas other plans are not. From Table 1 it is clear that the in-control RI values are reasonably stable across a range of mean values. If these mean changes are not systematic as in table above,

no outbreaks, then each generation in turn would be pruned away until no offspring remain. Even the target region (the original parent) would be pruned indicating no

**3.4 Using the example to demonstrate the stopping rules for the tree growing process**  It is useful to stop partitioning when it is known that the threshold will not be exceeded with further partitions. Assume that the threshold is 4,45 *<sup>z</sup> h* then the node defined by rules La>15, Lo<10 and Lo>7 should be terminal because ( 36 2.5) 4,4 . No partition of this parent space is going to be greater than the threshold of 4,45. Two other nodes would

( 4 3 2 2 3 2 2 1 0 0 1 2,5) 2,9 4,45 .

( 3 2 1 4 2 1 2,5) 2,0 4,45 .

Such a stopping rule stops the unnecessary growing of the surveillance tree, because all offspring will be pruned later. Generally this rule can be defined as stopping when the square root of the total of cell counts for a region is smaller than the threshold plus square root of the smallest cell mean in the region. If the threshold is quite low, then having a threshold on the lower limit for the smoothed count is sensible. Assume that our threshold turns out to be 1,15 *<sup>z</sup> h* then the above stopping rule does not help much in reducing the partitioning complexity. In the above example, it is sometimes helpful to specify when you would fail to respond, e.g., no response is made for any outbreak with counts less than 5 above expected per day. We would then stop if smoothed counts were

Until now, we have assumed that all cells follow the same model. What if the same time trends persist across cells, e.g., the seasonal trend remains the same for each cell but the mean counts change from cell to cell? How does this influence the threshold for when the model is not homogeneous across all cells? Here we examined the in-control ARL and RI properties as we move the cell mean rate from an average of 0,7 to the current value used of 1,2 by examining its in-control ARL and RI using the same threshold of *<sup>z</sup> h* 1,15. The results are recorded in the Table 1 for counts generated using the R function in Appendix A with overall mean mu=0,1, day-of-the-week influences given by bweekday=( -0,025, -0,025, - 0,025, -0,025, -0,025,0,05,0,075), seasonal influences given by bcos=0,1, bsin=-0,1, transitional

This plan is also reasonably robust to changes in spatial non-homogeneity in means whereas other plans are not. From Table 1 it is clear that the in-control RI values are reasonably stable across a range of mean values. If these mean changes are not systematic as in table above,

terminate in generation three Figure 5 using 4,45 *<sup>z</sup> h* . These would be:

**3.5 Lattice with cell means that are spatially non-homogeneous** 

influences of blags=(0,1,0,06,0,02,0,01), and k=-4.

significant outbreak.

a. Lo>9 and La>19 where

b. La<3 and Lo>17 where

below 7,5.


but random in space then these changes are expected to be ironed out by the spatial smoothing. Adaptive plans can be derived using the results in Table 1 based on Sparks (2000) and Sparks et al (2010a).

Table 1. In-control ARLs and RIs for specified thresholds of 1,15, 1,2 and 1,25

#### **4. Simulation study**

This section provides the results of simulation studies comparing the performance of the SCAN statistic with that of the EWMA Surveillance Trees for two outbreak shapes and locations using Average Time to Signal and Average Run Length as the measures of performance. When the size and shape of the scanning window matches that of the outbreak in the SCAN plan, then it remains optimal. However, the flexible partitions of EWMA Surveillance Trees result in robust performance across a range of outbreak dimensions, without being optimal for any specific outbreak size or shape.

We simulate uncorrelated Poisson counts on a 40x40 spatial grid using the R function in Appendix A as described in Section 3.4, with the exception that the mean value is bounded at a minimum of one (i.e., k=0) which makes the counts more homogeneous. Denote these counts by *<sup>i</sup>*, *<sup>j</sup> c* for row *i* and column *j*. These counts are then aggregated locally as follows for non-boundary cells:

$$\mathcal{L}\_{i,j} = \text{floor}(\overline{\mathsf{c}}\_{i,j} + 0, \mathsf{\mathsf{x}} \times (\overline{\mathsf{c}}\_{i \cdot 1, j} + \overline{\mathsf{c}}\_{i, j \cdot 1} + \overline{\mathsf{c}}\_{i + 1, j} + \overline{\mathsf{c}}\_{i, j + 1}) + 0, \mathsf{1} \times (\overline{\mathsf{c}}\_{i \cdot 1, j \cdot 1} + \overline{\mathsf{c}}\_{i \cdot 1, j + 1} + \overline{\mathsf{c}}\_{i + 1, j - 1} + \overline{\mathsf{c}}\_{i + 1, j + 1})) \ / \ $$

where floor means round down to the nearest whole number. Boundary cells have smaller means and their counts are defined as in Table 2.

This generates more realistic spatially correlated in-control counts for infectious or contagious diseases. Note that these counts are not expected to be Poisson distributed, but tend to be under-dispersed counts relative to the Poisson distribution. We assume no knowledge of this under-dispersion in trying to forecast counts. We model the trends in the counts using transitional Poisson regression model and assume that if the counts are incontrol then the one day ahead forecast errors are uncorrelated. In situations where spatial correlation persists in the forecast errors for in-control situations, an alternative approach is models which account for this spatial correlation, such as seeming unrelated Poisson regression models (King, 1989, and Grijalva, T., Bohara, *et al.* 2003), and multivariate Poisson

Spatio-Temporal Disease Surveillance 173

The 1600 simulations were selected balancing both computational effort and estimation accuracy. Traditionally, the standard deviation for RL with an in-control ARL=100 is about 100. The standard deviation from the in-control ARL is expected to be roughly

The out-of-control ARL are calculated in the same way as the in-control ARL, except with each run additional counts with the same mean are added to the outbreak region. The results are reported in Table 1. The recurrence interval for out-of-control situations was not very interesting because generally when a plan signalled it did not stop signalling. The out-of-control ARLs are therefore the zero state out-of-control ARLs. Reporting the incontrol RI values provides readers insight into the steady state in-control performance.

> 10:19 16:19

 108,9 / 37,5 108,9 / 37,5 201,5/70,6 201,5/70,6 108,3 / 69,6 108,3 / 69,6 1 **11,88 11,95** 13,25 12,00 15,42 16,26 2 6,78 **4,86** 7,96 4,91 **5,61** 6,39 3 4,51 **3,55** 5,12 4,11 **3,94** 3,95 4 3,72 **2,76** 3,80 3,20 **2,94** 2,93 5 3,02 **2,35** 3,24 2,55 **2,65** 2,44 6 2,51 2,12 2,72 2,19 **2,10 2,11**  7 2,44 **1,70** 2,50 1,77 **1,97** 1,79 8 2,40 1,55 2,41 1,60 **1,70 1,53**  9 2,34 **1,45** 2,36 1,50 **1,56 1,45**  10 2,10 1,31 2,12 1,32 **1,20 1,30**  11 2,02 **1,12** 2,04 1,16 **1,15** 1,16 12 1,91 **1,05** 1,91 1,07 **1,04** 1,08 Table 3. ARL performance of the surveillance tree plans when the outbreak spans regions of

Recursive partitioning does well when the outbreak is located on the boundary of the target region, partially because fewer partitions are necessary to find it. Another reason is that the boundary cells are where counts are smoothed less and the mean rates are lowest on the boundary, and therefore the signal is the highest. In other words, boundary outbreaks are easier to find using recursive partitioning because fewer generations are needed to find them and the signal-to-noise ratio on the boundary in this simulation process is higher.

The robust performance of plans, i.e., when nothing is known about the shape and size of the outbreak, is an important consideration because in bio-terrorism nothing is known about the size or shape of the outbreak. Therefore, Table 3 looks at a variation in rectangular shaped outbreaks. The recursive partitioning plan's flexibility in partitions strengthens its robust performance across a range of outbreak sizes (see Sparks *et al*. 2011c). In other words,

Recursive partitioning with temporal smoothing ( 0,1 ) 1,404 *<sup>z</sup> h*

> 10:11 1:20

Recursive partitioning with spatio-temporal smoothing

1,15 *<sup>z</sup> h*

0,1 )

> 10:11 1:20

 ( 0,7, 

10:19 16:19

100 / 1600 2,5 .

Recursive partitioning with temporal smoothing ( 0,1 ) 1,255 *<sup>z</sup> h*

> 10:11 1:20

40 cells δ is the increase in mean for cells in the outbreak region.

10:19 16:19

Outbreak

Rows Columns

ARL/RI

regression model (Karlis and Meligkotsidou, 2005, and Bermudez and Karlis, 2011). However, here we model each of the 1600 cells separately using Poisson regression models very similar to that used in Sparks *et al.* (2010a) and Sparks *et al.* (2011a) that use explanatory variables: logarithm of lag counts plus 1, day-of-the-week, public holidays and harmonics. These models are used to produce one-day ahead forecasts which estimate the "Poisson" means for the next day.


Table 2. The process used to simulate cells counts that are spatially correlated.

The recursive partitioning process is then applied to the temporal or spatio-temporal smoothed counts. The *<sup>g</sup> z* statistic is used to assess the departure of smoothed counts from smoothed day ahead forecasted counts. When in-control, *<sup>g</sup> z* has mean zero and approximately a constant variance. The convenience of the statistic is discussed in Sparks *et al.* (2010b) and has been demonstrated to compare very well with the SCAN statistic.

The simulation process generated in-control counts. Simulated bio-terrorism outbreaks are generated by adding to these in-control counts additional Poisson counts for a fixed rectangular region. The outbreak region is then hidden and we examined how early the plans alarm this outbreak. Rectangular outbreak regions were generated involving either 10x4 cells or 20x2 cells. Zero-state in-control ARLs and the Recurrent Interval (RI) were estimated as follows:


regression model (Karlis and Meligkotsidou, 2005, and Bermudez and Karlis, 2011). However, here we model each of the 1600 cells separately using Poisson regression models very similar to that used in Sparks *et al.* (2010a) and Sparks *et al.* (2011a) that use explanatory variables: logarithm of lag counts plus 1, day-of-the-week, public holidays and harmonics. These models are used to produce one-day ahead forecasts which estimate the "Poisson"

*i* 1 , 1 40 *j* 1, 1,j 1,j 1 1,j-1 2,j 2,j 1 2,j 1 floor(c +0,3 (c +c +c )+0,1 (c +c ) ) *<sup>j</sup> c*

1 40 *i* , *j* 1 ,1 i,1 i 1,1 i-1,1 i,2 i-1,2 i 1,2 floor(c +0,3 (c +c +c )+0,1 (c +c ) ) *<sup>i</sup> c*

*i* 1 , *j* 1 1,1 1,1 2,1 1,2 2,2 *c* floor(c +0,3 (c +c )+0,1 c )

*i* 40 , *j* 40 40,40 40,40 39,40 40,39 39,39 *c* floor(c +0,3 (c +c )+0,1 c )

Table 2. The process used to simulate cells counts that are spatially correlated.

*i* 40 , 1 40 *j* 40, 1,j 40,j 1 40,j-1 39,j 39,j 1 39,j 1 floor(c +0,3 (c +c +c )+0,1 (c +c ) ) *<sup>j</sup> c*

1 40 *j* , *j* 40 ,40 i,1 i 1,40 i-1,40 i,39 i-1,39 i 1,39 floor(c +0,3 (c +c +c )+0,1 (c +c ) ) *<sup>i</sup> c*

The recursive partitioning process is then applied to the temporal or spatio-temporal smoothed counts. The *<sup>g</sup> z* statistic is used to assess the departure of smoothed counts from smoothed day ahead forecasted counts. When in-control, *<sup>g</sup> z* has mean zero and approximately a constant variance. The convenience of the statistic is discussed in Sparks *et* 

The simulation process generated in-control counts. Simulated bio-terrorism outbreaks are generated by adding to these in-control counts additional Poisson counts for a fixed rectangular region. The outbreak region is then hidden and we examined how early the plans alarm this outbreak. Rectangular outbreak regions were generated involving either 10x4 cells or 20x2 cells. Zero-state in-control ARLs and the Recurrent Interval (RI) were

 Start the Run Length (RL) at 1. Generate in-control cell counts for the first day, and perform the surveillance tree process. If no tree results after pruning then go to the next

 Increase the RL by 1. Generate in-control cell counts for the next day, perform the surveillance tree process, and if no tree results after pruning, then go to the next step. Repeat the previous step until some branches of the surveillance tree survive the

Generating in-control cell counts for the several consecutive days until there are at least

 Increase the RI by 1. Generate in-control cell counts for the next day, perform the surveillance process, and if no tree results after pruning, then go to the next step. Repeat the previous step until some branches of the surveillance tree survive the

Repeat all steps above 1600 times to give ( , ), 1,2,...,1600 *RL RI i i i* . Average these to give

pruning process. Record the RL (denote this *RL*<sup>1</sup> ) and go to the next step.

pruning process. Record the RI (denote this *RI*<sup>1</sup> ) and go to the next step.

7 consecutive days with no tree surviving, then set RI=7.

the in-control ARL and an estimate of the expected RI.

*al.* (2010b) and has been demonstrated to compare very well with the SCAN statistic.

means for the next day.

estimated as follows:

step.

The 1600 simulations were selected balancing both computational effort and estimation accuracy. Traditionally, the standard deviation for RL with an in-control ARL=100 is about 100. The standard deviation from the in-control ARL is expected to be roughly 100 / 1600 2,5 .

The out-of-control ARL are calculated in the same way as the in-control ARL, except with each run additional counts with the same mean are added to the outbreak region. The results are reported in Table 1. The recurrence interval for out-of-control situations was not very interesting because generally when a plan signalled it did not stop signalling. The out-of-control ARLs are therefore the zero state out-of-control ARLs. Reporting the incontrol RI values provides readers insight into the steady state in-control performance.


Table 3. ARL performance of the surveillance tree plans when the outbreak spans regions of 40 cells δ is the increase in mean for cells in the outbreak region.

Recursive partitioning does well when the outbreak is located on the boundary of the target region, partially because fewer partitions are necessary to find it. Another reason is that the boundary cells are where counts are smoothed less and the mean rates are lowest on the boundary, and therefore the signal is the highest. In other words, boundary outbreaks are easier to find using recursive partitioning because fewer generations are needed to find them and the signal-to-noise ratio on the boundary in this simulation process is higher.

The robust performance of plans, i.e., when nothing is known about the shape and size of the outbreak, is an important consideration because in bio-terrorism nothing is known about the size or shape of the outbreak. Therefore, Table 3 looks at a variation in rectangular shaped outbreaks. The recursive partitioning plan's flexibility in partitions strengthens its robust performance across a range of outbreak sizes (see Sparks *et al*. 2011c). In other words,

agents.

Rows Columns

ARL/RI

Spatio-Temporal Disease Surveillance 175

An advantage of the ability to extend to higher dimensions is that Surveillance Trees could be used to monitor several related diseases or symptoms jointly using disease category as a variable searched over in the recursive partitioning process. This would mean having one false alarm rate for all disease groups being monitored. Such planning may be beneficial in terms of managing the effort an epidemiological unit can devote to investigating false alarms. Similarly the method could also be used for syndromic surveillance – the surveillance of public health records such as Emergency Department visits. Detecting increases in ill-defined syndromes that may indicate the use of new and unknown biological

Because of the efficiency of computation, another advantage of recursive partitioning is that it could be implemented using a hand-held mini-computer in the field to assess whether say counts have increased more than expected for any cluster of a target region. It does not need

Ultimately the EWMA Surveillance Tree methodology is a promising tool for detecting spatio-temporal outbreaks when the size and location of the outbreak is unknown. By incorporating the benefits of temporal and spatial smoothing with a tool for efficiently locating outbreaks, the method is well suited for the surveillance of bioterrorism events.

> 10:11 1:20

 94,5 / 77,2 94,5 / 77,2 80,5/65,6 80,5/65,6 1 14,22 49,66 32,38 59,95 2 5,63 8,14 5,80 15,26 **3,85** 5,31 4,08 5,97 **2,99** 4,39 3,28 4,88 **1,79** 3,55 2,54 3,99 **1,60** 3,07 2,23 3,39 **1,54** 2,63 1,85 2,92 **1,30** 2,30 1,78 2,53 **1,22** 2,09 1,56 2,31 **1,15** 1,98 1,43 2,11 **1,05** 1,78 1,30 1,91 **1,01** 1,68 1,19 1,72

Table 4. ARL performance of the SCAN plan when the outbreak spans regions of 40 cells.

SCAN plan for 10 by 10 scans Threshold p-value=0,999999999999994

> 10:11 1:20

10:19 16:19

a sophisticated computer, as opposed to the SCAN plan.

Threshold p-value=0,9999999999999991

Outbreak SCAN plan for 5 by 5 scans

10:19 16:19

if the outbreak dimensions are known, then the SCAN plan can always be trained to be more efficient than the recursive partitioning plan, and therefore it is preferred in these circumstances. But when nothing is known about its size and shape recursive partitioning is a more flexible and robust approach than the SCAN plan.

Table 3 demonstrates that spatial smoothing prior to applying recursive partitioning of temporally smoothed counts has early detection advantages for the same in-control RI values of approximately 70, e.g., for large shift in mean counts of 12 per cell the ARL is 1,04 as opposed to 1,91 (almost a day later). The spatial smoothing also closes the gap between the in-control ARL and in-control RI, and therefore it is likely to have equivalent steady state early detection advantages.

The future research challenges are to investigate what levels of spatial smoothing are best for early detection. In these studies we only tried 0,7. In addition, there may be an interaction between the temporal smoothing and spatial smoothing. Therefore this needs to be investigated in future work.

The SCAN plan was applied using two separate scanning window sizes: 5 by 5 cells and 10 by 10 cells. As nothing is known about future outbreaks, therefore we consider square m by m scans – for at least the first outbreak considered in Table 4 the 5 by 5 cells scans is close to the m by m scanning plan with the smallest out-of-control ARLs. Unlike the recursive partitioning for the scan plans the expected values were taken as known, and therefore this gives an advantage to the SCAN plan in this comparison. Note that there was less of a difference between the in-control ARL and in-control RI for the scan plans than the surveillance tree plan. This mostly occurs because of the way we estimated the RI. In addition, by not allowing a signal in the first 7 values after a outbreak signal means that there are only a correlation of 0,3 between the signalled time point and the start of the RI for the scan plan, whereas in plans using the temporal EWMA smoothing with 0,1 this is 0,48. Hence the way we estimated RI is going to be more influenced by the previous signal in the surveillance tree plan and therefore its RI is smaller. Clearly, the more the outbreak matches the scanning region the earlier the SCAN plan detects outbreaks, but it is clear that it lacks the flexibility in terms of detecting outbreaks of unknown shape and size. It is worth noting that when the SCAN plan does perform better than the surveillance trees (comparing Tables 3 and 4), the surveillance trees are not far behind in performance whereas the reverse is not true. Figure 7 provides a graphical comparison of the ARL properties. From Figure 7, the SCAN plan 5 by 5 by 10 days is best on average for detecting a 10 by 4 outbreak which is near the centre of the target region but significantly worse for the 2 by 20 outbreak. The 2 by 20 outbreak is quite feasible, e.g., this shaped outbreak would result if the outbreak was started in railway carriages travelling along a specific route.

#### **5. Conclusions**

The recursive partitioning approach can easily be extended to deal with more than two or three dimensions. It is ideal for finding disease outbreaks which cluster in multivariate space that could also include variables such as triage category (severity), age or gender, whereas the SCAN plans become computationally infeasible in higher dimensions. However further research is required to determine whether we can extend the benefit found from spatial smoothing to other dimensions that are not spatial, for example smoothing over age groups.

if the outbreak dimensions are known, then the SCAN plan can always be trained to be more efficient than the recursive partitioning plan, and therefore it is preferred in these circumstances. But when nothing is known about its size and shape recursive partitioning is

Table 3 demonstrates that spatial smoothing prior to applying recursive partitioning of temporally smoothed counts has early detection advantages for the same in-control RI values of approximately 70, e.g., for large shift in mean counts of 12 per cell the ARL is 1,04 as opposed to 1,91 (almost a day later). The spatial smoothing also closes the gap between the in-control ARL and in-control RI, and therefore it is likely to have equivalent steady state

The future research challenges are to investigate what levels of spatial smoothing are best

interaction between the temporal smoothing and spatial smoothing. Therefore this needs to

The SCAN plan was applied using two separate scanning window sizes: 5 by 5 cells and 10 by 10 cells. As nothing is known about future outbreaks, therefore we consider square m by m scans – for at least the first outbreak considered in Table 4 the 5 by 5 cells scans is close to the m by m scanning plan with the smallest out-of-control ARLs. Unlike the recursive partitioning for the scan plans the expected values were taken as known, and therefore this gives an advantage to the SCAN plan in this comparison. Note that there was less of a difference between the in-control ARL and in-control RI for the scan plans than the surveillance tree plan. This mostly occurs because of the way we estimated the RI. In addition, by not allowing a signal in the first 7 values after a outbreak signal means that there are only a correlation of 0,3 between the signalled time point and the start of the RI for

0,48. Hence the way we estimated RI is going to be more influenced by the previous signal in the surveillance tree plan and therefore its RI is smaller. Clearly, the more the outbreak matches the scanning region the earlier the SCAN plan detects outbreaks, but it is clear that it lacks the flexibility in terms of detecting outbreaks of unknown shape and size. It is worth noting that when the SCAN plan does perform better than the surveillance trees (comparing Tables 3 and 4), the surveillance trees are not far behind in performance whereas the reverse is not true. Figure 7 provides a graphical comparison of the ARL properties. From Figure 7, the SCAN plan 5 by 5 by 10 days is best on average for detecting a 10 by 4 outbreak which is near the centre of the target region but significantly worse for the 2 by 20 outbreak. The 2 by 20 outbreak is quite feasible, e.g., this shaped outbreak would result if the outbreak was

The recursive partitioning approach can easily be extended to deal with more than two or three dimensions. It is ideal for finding disease outbreaks which cluster in multivariate space that could also include variables such as triage category (severity), age or gender, whereas the SCAN plans become computationally infeasible in higher dimensions. However further research is required to determine whether we can extend the benefit found from spatial smoothing to other dimensions that are not spatial, for example smoothing over age groups.

the scan plan, whereas in plans using the temporal EWMA smoothing with

In addition, there may be an

0,1 this is

a more flexible and robust approach than the SCAN plan.

for early detection. In these studies we only tried 0,7.

started in railway carriages travelling along a specific route.

early detection advantages.

be investigated in future work.

**5. Conclusions** 

An advantage of the ability to extend to higher dimensions is that Surveillance Trees could be used to monitor several related diseases or symptoms jointly using disease category as a variable searched over in the recursive partitioning process. This would mean having one false alarm rate for all disease groups being monitored. Such planning may be beneficial in terms of managing the effort an epidemiological unit can devote to investigating false alarms. Similarly the method could also be used for syndromic surveillance – the surveillance of public health records such as Emergency Department visits. Detecting increases in ill-defined syndromes that may indicate the use of new and unknown biological agents.

Because of the efficiency of computation, another advantage of recursive partitioning is that it could be implemented using a hand-held mini-computer in the field to assess whether say counts have increased more than expected for any cluster of a target region. It does not need a sophisticated computer, as opposed to the SCAN plan.

Ultimately the EWMA Surveillance Tree methodology is a promising tool for detecting spatio-temporal outbreaks when the size and location of the outbreak is unknown. By incorporating the benefits of temporal and spatial smoothing with a tool for efficiently locating outbreaks, the method is well suited for the surveillance of bioterrorism events.


Table 4. ARL performance of the SCAN plan when the outbreak spans regions of 40 cells.

Spatio-Temporal Disease Surveillance 177

Bermudez, L. & Karlis, D. (2011). Bayesian multivariate Poisson models for insurance

Currie, I., Durban, M. & Eilers, P. (2003). Using p-splines to extrapolate two-dimensional

Chen, J., & Glaz, J. (1996). Two-Dimensional Discrete SCAN Statistics, Statistics and

Grijalva, T., Bohara, A.K. & Berrens, R.P. (2003). A seemingly unrelated Poisson model for revealed and stated preference data, *Applied Economics Letters*, 10: 443-446. Han, S. W., Mei, Y. & Tsui, K.-L. (2008). A comparison between SCAN and CUSUM

Joner M. D., Woodall, W.H. Reynolds, M.R., Fricker, R.D. (2008). A One-sided MEWMA

Karlis, D. & Meligkotsidou, L. (2005). Multivariate Poisson regression with covariance

King, G. (1989), "A Seemingly Unrelated Poisson Regression Model," *Sociological Methods and* 

Kulldorff, M. & Nagarwalla N. (1995). Spatial disease clusters: Detection and Inference.

Kulldorff, M. (1997). A spatial SCAN statistic, *Communications in Statistics: Theory and* 

Kulldorff, M. (2001). Prospective time periodic geographical disease surveillance using a

Kulldorff, M., Heffernan R., Hartman J., Assunção RM., Mostashari F. (2005). A space-time

Lee, D-J. & Durban, M. (2009). Smooth-CAR mixed models for spatial count data.

Raubertas, R.F. (1989). An analysis of disease surveillance data that uses the geographic

Riedel, K.S. (1993). Optimal data-based kernel estimation of evolutionary spectra. IEEE

*Computational Statistics and Data Analysis*, 53: 2968-2979.

Transactions on signal processing, 41: 2439-2447.

locations of reporting units, *Statistics in Medicine*, 18: 2111-2122.

SCAN statistic, *Journal of the Royal Statistical Society Series A-Statistics in Society* 164:

permutation SCAN statistic for the early detection of disease outbreaks, *PLoS* 

poisson data. In Proceedings of the 18th International *Workshop on Statistical Modeling*, Verbeke, G. Molenberghs, A., & Fieuws, S. (Eds.). Katholieke Universiteit

methods for detecting increases in Poisson rates, *Technical Report*, School of ISyE,

Chart for Health Surveillance. *Quality and Reliability Engineering International.* 24:

if(i==1)ncount<-rpois(1,exp(temp)) else ncount<-c(ncount,rpois(1,exp(temp)))

ratemaking. *Insurance: Mathematics and Economics*, 48:226-236.

Glaz, J., Naus, J. & Wallenstein, S. (2001) *SCAN Statistics*. Springer, New York.

temp[temp<k]<-k

lcounts[1]<-ncount[i]

dw<-dw+1 if(dw>7)dw<-1}

**7. References** 

ncount}

lcounts[2:nlag]<-lcounts[1:(nlag-1)]

Leuven: 97–102.

503-518.

61-72.

*Research*, 17: 235–255.

*Methods*, 26:1481-1496.

*Medicine*, 2:216-224.

Probability Letters, 3 1 : 59-68.

Georgia Institute of Technology.

*Statistics in Medicine*, 14:799-810.

structure. *Statistics and Computing*, 15: 255-265.

Fig. 7. In-control and out-of-control ARL for the SCAN plan and Surveillance Tree plan

#### **6. Appendix**

Inputs to the function are mu=overall mean; bweekday=day-of-the-week influence; bcos and bsin = influence of the season; blags = is the influence of previous counts; nob= number of counts generated; k is the threshold specifying the lowest expected count allowed (Poisson counts are truncated at times).

```
transitionalPoisson<-function(mu,bweekday,bcos,bsin,blags,nob,k){ 
nlag<-length(blags) 
lcounts<-rep(0,nlag) 
dw<-1 
for(i in 1:nob){ 
temp<-
mu+bweekday[dw]+bcos*cos(2*pi*i/365.25)+bsin*sin(2*pi*i/365.25)+sum(blags*log(lcounts
+1))
```
temp[temp<k]<-k if(i==1)ncount<-rpois(1,exp(temp)) else ncount<-c(ncount,rpois(1,exp(temp))) lcounts[2:nlag]<-lcounts[1:(nlag-1)] lcounts[1]<-ncount[i] dw<-dw+1 if(dw>7)dw<-1} ncount}

#### **7. References**

176 Bioterrorism

a) b)

Fig. 7. In-control and out-of-control ARL for the SCAN plan and Surveillance Tree plan

transitionalPoisson<-function(mu,bweekday,bcos,bsin,blags,nob,k){

Inputs to the function are mu=overall mean; bweekday=day-of-the-week influence; bcos and bsin = influence of the season; blags = is the influence of previous counts; nob= number of counts generated; k is the threshold specifying the lowest expected count allowed

mu+bweekday[dw]+bcos\*cos(2\*pi\*i/365.25)+bsin\*sin(2\*pi\*i/365.25)+sum(blags\*log(lcounts

**6. Appendix** 

nlag<-length(blags) lcounts<-rep(0,nlag)

for(i in 1:nob){

dw<-1

temp<-

+1))

(Poisson counts are truncated at times).


**9** 

*USA* 

**Rickettsia and Rickettsial Diseases** 

*Department of Pathology, University of Texas Medical Branch, Galveston, Texas* 

*Rickettsia prowazekii* and *R. rickettsii* are HHS and USDA select agents (http://biosafety.utk.edu/pdfs/salist.pdf), and *R. prowazekii* is a category B bioterrorism agent as determined by the Centers for Disease Control and Prevention (CDC) (http://www.bt.cdc.gov/agent/agentlist-category.asp). The criteria for CDC category B bioterrorism agents are that the organisms are moderately easy to disseminate, cause diseases with moderate morbidity and low mortality and require specific enhancements of diagnostic capacity and enhanced disease surveillance. However, the case fatality ratios of both *R. prowazekii* and *R. rickettsii* may exceed the CDC bioterrorism agent category B level. Epidemic typhus caused by *R. prowazekii* and Rocky Mountain spotted fever (RMSF) caused by *R. rickettsii* can reach up to 60% fatalities without antibiotic treatment and 4% even with antibiotic treatment (Raoult et al., 2004). *Rickettsia* had been explored for biowarfare use. The former Soviet Union developed *R. prowazekii* as a biologic weapon in the 1930s (Alibek K and Handelman S, 2009). During World War II, the Japanese performed human experiments with rickettsial agents for purposes of biologic weapon development during

Epidemic typhus, also known as louse-borne typhus, has been distributed worldwide, was one of the man's major scourges and frequently played a decisive role in wars in Europe from the 15th through 20th centuries, thus affecting the course of European history (Conlon JM, 2007). It killed millions of people through this period. Although worldwide epidemics of typhus may not occur again, the threat of louse-borne typhus is still real as small scale epidemics or large scale epidemics in settings of extreme poverty and natural and manmade disasters. Louse-borne typhus occurs in epidemics when social, economic, or political systems are disrupted exposing a large population such as refugees to louse infestation due to lack of hygiene. This situation has been observed in recent outbreaks of typhus in Burundi, Algeria, Peru, and Russia. In 1997, it was estimated that as many as 100,000 cases of typhus occurred in the refugee camps of Burundi during a civil war

RMSF originated as an emerging infectious disease on the western *frontier* in the Rocky Mountains. Now the disease is found all over the United States, and over half of the cases occur in the southeastern and south-central regions of the United States and in South

America (Center for Disease Control and prevention[CDC], 2010).

**1. Introduction** 

(Raoult et al., 2004).

their occupation of China (Harris S, 1992).

Xue-jie Yu and David H. Walker

