**4. Results**

8 Will-be-set-by-IN-TECH

Kéry, Royle, Plattner & Dorazio 2009, Walls et al. 2011); however, this idea is not universally applicable. For example, if the occurrence of one species depends on the presence or absence of another species (as might occur between a predator and prey species or between strongly competing species), then ecological similarity would not be a reasonable assumption. In this case a model must be formulated to specify the pattern of co-occurrence that arises from interspecific interactions (MacKenzie et al. 2004, Waddle et al. 2010). The formulation of statistical models for inferring interspecific interactions in communities of species is an

In assemblages of ecologically similar species, it seems reasonable to use distributional assumptions to model unobserved sources of heterogeneity in probabilities of species occurrence and detection. For example, occurrence probabilities may be low for some species (the rare ones) and high for others, but all species are related in the sense that they belong to a larger community of ecologically similar species. By modeling the heterogeneity among species in this way, the data observed for any individual species influence the parameter estimates of every other species in the community. In other words, inferences about an individual species do not depend solely on the observations of that species because the inferences borrow strength from the observations of other species. A practical manifestation of this multispecies approach is that the estimate of a parameter (e.g., occurrence probability) of a single species reflects a compromise between the estimate that would be obtained by analyzing the data from each species separately and the average value of that parameter among all species in the community. In the statistical literature this phenomenon is called "shrinkage" (Gelman et al. 2004) because each species-specific estimate is shrunk in the direction of the estimated average parameter value. Of course, the amount of shrinkage depends on the relative amount of information about the parameter in the observations of each species versus the information about the mean value of that parameter. An important benefit of shrinkage is that it allows parameters to be estimated for a species that is detected with such low frequency that its parameters could otherwise not be estimated. Such species are often the rarest members of the community, and it is crucial that these species be included

important and developing area of research (Dorazio et al. 2010).

in the analysis to ensure that estimates of biodiversity are accurate.

to which species occurrence and detection probabilities are correlated.

 *β*0 *α*0 , *σ*2

to specify the variation in occurrence and detection probabilities among ant species. The parameters *σb*<sup>0</sup> and *σa*<sup>0</sup> denote the magnitude of this variation, and *ρ* parameterizes the extent

We also use the normal distribution to specify variation among the species-specific effects of

so that the effects of different covariates are assumed to be mutually independent and

The hierarchical model described in Section 3.1 would be impossible to fit using classical methods owing to the high-dimensional and analytically intractable integrations involved

*<sup>b</sup>*<sup>0</sup> *ρ σb*0*σa*<sup>0</sup> *ρ σb*0*σa*<sup>0</sup> *<sup>σ</sup>*<sup>2</sup>

*iid*

*a*0

<sup>∼</sup> Normal(*βl*, *<sup>σ</sup>*<sup>2</sup>

*bl*

, (3)

) (for *l* = 1, . . . , *p*),

In the present analysis we use a normal distribution

covariates on occurrence. Specifically, we assume *bli*

uncorrelated.

**3.2 Parameter estimation**

 *b*0*i a*0*i iid* ∼ Normal

#### **4.1 Effects of covariates on species occurrence**

The posterior model probabilities calculated in our analysis of forest and bog data sets are only mildly sensitive to our choice of priors for the logit-scale parameters of the model (Table 2). Recall that these parameters are of primary interest in assessing the relative contributions of geographic- and site-level covariates. Regardless of the prior distribution used (Uniform or Jeffreys' (see appendix)), the model with highest probability includes all four covariates (LAT, LAI, GSF, ELEV) in the analysis of data observed at forest sample sites and a single covariate (ELEV) in the analysis of data observed at bog sample sites. However, the model without any covariates has nearly equal probability to the favored model of the bog data, and the combined probability of these two models far exceeds the probabilities of all other models. These results suggest that occurrence probabilities of ant species found in the bog habitat are not strongly influenced by the LAT or AREA covariates, either alone or in combination with other covariates.

42.0 43.0 44.0 45.0

2345

Leaf area index

0 100 300 500

Elevation

0.0

0.0

Pro

Fig. 1. Estimated effects of covariates on occurrence probabilities of ant species in forest

b

ability of occurrence

0.2

0.4

0.6

0.8

1.0

Pro

b

ability of occurrence

Modern Methods of Estimating Biodiversity from Presence-Absence Surveys 287

0.2

0.4

0.6

0.8

1.0

Latitude

0.05 0.10 0.15 0.20

Light availability

0.0

0.0

habitat.

Pro

b

ability of occurrence

0.2

0.4

0.6

0.8

1.0

Pro

b

ability of occurrence

0.2

0.4

0.6

0.8

1.0

Each of the four covariates used to model species occurrences in the forest habitat has an average, negative effect on occurrence probabilities. Estimates of *β<sup>l</sup>* and 95% credible intervals are as follows: LAT, -0.717 (−1.217, −0.257); LAI, -0.850 (−1.302, −0.440); GSF, -0.494, (−0.916, −0.098); ELEV, -0.662 (−1.014, −0.339). However, as illustrated in Figure 1, there is considerable variation among species in the magnitude of these effects . Similarly, the estimated occurrence probabilities of ants in the bog habitat decrease with ELEV (*β*ˆ <sup>1</sup> = −0.500 (−1.019, −0.098)), and there is considerable variation among species (*σ*ˆ*b*<sup>1</sup> = 0.320 (0.014, 1.000)) in the magnitude of ELEV effects.

### **4.2 Estimates of biodiversity**

Our pitfall trap surveys revealed *n* = 34 distinct species of ants at the forest sample sites and *n* = 19 species at the bog sample sites. The estimated species richness of ants found in the forest habitat (*N*ˆ = 43 (95% interval = (37, 70)) is nearly twice the estimated richness of ants in the bog habitat (*N*ˆ = 25 (95% interval = (21, 25)); however, the estimate of forest ant richness is relatively imprecise and the estimate of bog ant richness is strongly influenced by the upper bound (*M* = 25 species).

The numbers of species found in forest and bog communities are perhaps better compared using estimates of species richness at the sample sites. These measures of alpha diversity are plotted against each site's elevation in Figure 2, which also includes the number of ant species actually captured. The estimated richness at sites in the forest habitat usually exceeds that at sites in the bog habitat when the effects of elevation on species occurrences are taken into account. Note also that a site's estimated species richness can be much higher than the numbers of species captured because capture probabilities are much lower than one for most species (Tables 3 and 4).

Site-specific estimates of beta diversity between bog and forest communities of ants are relatively high, ranging from 0.71 to 1.0 (Figure 3). These estimates also generally exceed the beta diversities between ants from different sites within each habitat (Figure 4), adding further support for the hypothesis that composition of ant species differs greatly between forest and bog habitats.

### **5. Discussion**

#### **5.1 Analysis of ant species**

It is interesting to compare the results of our analyses with the results reported by Gotelli & Ellison (2002), who analyzed the same data but did not account for errors in detection of species. Gotelli & Ellison (2002) used linear regression models to estimate associations between the number of observed species (which was referred to as "species density") and environmental covariates. For bog ants Gotelli & Ellison (2002) reported a significant association between species density and latitude (*P* = 0.041) and a marginally significant association between species density and vegetation structure (as measured by the first principal-component score; *P* = 0.081). Collectively, these two variables accounted for about 30% of the variation in species density. In the present analysis of the bog data, the best fitting model included the effect of a single covariate (ELEV) on ant species occurrence probabilities, though a model without any covariates was a close second (Table 2). In the analysis of forest ants Gotelli & Ellison (2002) reported significant positive associations between species 10 Will-be-set-by-IN-TECH

Each of the four covariates used to model species occurrences in the forest habitat has an average, negative effect on occurrence probabilities. Estimates of *β<sup>l</sup>* and 95% credible intervals are as follows: LAT, -0.717 (−1.217, −0.257); LAI, -0.850 (−1.302, −0.440); GSF, -0.494, (−0.916, −0.098); ELEV, -0.662 (−1.014, −0.339). However, as illustrated in Figure 1, there is considerable variation among species in the magnitude of these effects . Similarly, the estimated occurrence probabilities of ants in the bog habitat decrease with ELEV (*β*ˆ

−0.500 (−1.019, −0.098)), and there is considerable variation among species (*σ*ˆ*b*<sup>1</sup> = 0.320

Our pitfall trap surveys revealed *n* = 34 distinct species of ants at the forest sample sites and *n* = 19 species at the bog sample sites. The estimated species richness of ants found in the forest habitat (*N*ˆ = 43 (95% interval = (37, 70)) is nearly twice the estimated richness of ants in the bog habitat (*N*ˆ = 25 (95% interval = (21, 25)); however, the estimate of forest ant richness is relatively imprecise and the estimate of bog ant richness is strongly influenced by the upper

The numbers of species found in forest and bog communities are perhaps better compared using estimates of species richness at the sample sites. These measures of alpha diversity are plotted against each site's elevation in Figure 2, which also includes the number of ant species actually captured. The estimated richness at sites in the forest habitat usually exceeds that at sites in the bog habitat when the effects of elevation on species occurrences are taken into account. Note also that a site's estimated species richness can be much higher than the numbers of species captured because capture probabilities are much lower than one for most

Site-specific estimates of beta diversity between bog and forest communities of ants are relatively high, ranging from 0.71 to 1.0 (Figure 3). These estimates also generally exceed the beta diversities between ants from different sites within each habitat (Figure 4), adding further support for the hypothesis that composition of ant species differs greatly between forest and

It is interesting to compare the results of our analyses with the results reported by Gotelli & Ellison (2002), who analyzed the same data but did not account for errors in detection of species. Gotelli & Ellison (2002) used linear regression models to estimate associations between the number of observed species (which was referred to as "species density") and environmental covariates. For bog ants Gotelli & Ellison (2002) reported a significant association between species density and latitude (*P* = 0.041) and a marginally significant association between species density and vegetation structure (as measured by the first principal-component score; *P* = 0.081). Collectively, these two variables accounted for about 30% of the variation in species density. In the present analysis of the bog data, the best fitting model included the effect of a single covariate (ELEV) on ant species occurrence probabilities, though a model without any covariates was a close second (Table 2). In the analysis of forest ants Gotelli & Ellison (2002) reported significant positive associations between species

(0.014, 1.000)) in the magnitude of ELEV effects.

**4.2 Estimates of biodiversity**

bound (*M* = 25 species).

species (Tables 3 and 4).

**5.1 Analysis of ant species**

bog habitats.

**5. Discussion**

<sup>1</sup> =

Fig. 1. Estimated effects of covariates on occurrence probabilities of ant species in forest habitat.

Capture probability Occurrence probability

Species Median 2.5% 97.5% Median 2.5% 97.5% *Amblyopone pallipes* 0.028 0.008 0.073 0.043 0.005 0.237 *Aphaenogaster rudis* (species complex) 0.237 0.209 0.269 0.779 0.539 0.927 *Campnnotus herculeanus* 0.090 0.062 0.123 0.255 0.104 0.482 *Campnnotus nearcticus* 0.035 0.013 0.074 0.083 0.014 0.316 *Campnnotus novaeboracensis* 0.017 0.008 0.037 0.454 0.121 0.897 *Campnnotus pennsylvanicus* 0.131 0.107 0.158 0.587 0.322 0.819 *Dolichoderus pustulatus* 0.011 0.002 0.053 0.042 0.003 0.389 *Formica argentea* 0.011 0.001 0.053 0.044 0.003 0.411 *Formica glacialis* 0.012 0.002 0.055 0.045 0.003 0.413 *Formica neogagates* 0.096 0.049 0.163 0.038 0.005 0.166 *Formica obscuriventris* 0.010 0.001 0.051 0.046 0.003 0.448 *Formica subaenescens* 0.051 0.029 0.081 0.229 0.085 0.476 *Formica subintegra* 0.166 0.083 0.284 0.029 0.003 0.140 *Formica subsericea* 0.248 0.184 0.320 0.059 0.009 0.218 *Lasius alienus* 0.053 0.035 0.075 0.499 0.260 0.761 *Lasius flavus* 0.011 0.002 0.051 0.043 0.003 0.397 *Lasius neoniger* 0.036 0.013 0.076 0.097 0.020 0.333 *Lasius speculiventris* 0.012 0.003 0.040 0.080 0.009 0.502 *Lasius umbratus* 0.017 0.007 0.037 0.429 0.109 0.931 *Myrmecina americana* 0.011 0.002 0.052 0.042 0.003 0.398 *Myrmica detritinodis* 0.078 0.049 0.117 0.169 0.055 0.378 *Myrmica lobifrons* 0.056 0.036 0.082 0.299 0.118 0.568 *Myrmica punctiventris* 0.248 0.218 0.279 0.739 0.474 0.911 *Myrmica* species 1 ("AF-scu") 0.102 0.078 0.131 0.368 0.152 0.642 *Myrmica* species 2 ("AF-smi") 0.064 0.039 0.097 0.148 0.036 0.385 *Prenolepis imparis* 0.012 0.002 0.054 0.031 0.002 0.334 *Stenamma brevicorne* 0.017 0.005 0.046 0.103 0.014 0.526 *Stenamma diecki* 0.030 0.014 0.056 0.302 0.097 0.725 *Stenamma impar* 0.049 0.026 0.081 0.168 0.052 0.396 *Stenamma schmitti* 0.013 0.005 0.030 0.252 0.046 0.753 *Tapinoma sessile* 0.023 0.010 0.047 0.171 0.035 0.552 *Temnothorax ambiguus* 0.056 0.015 0.138 0.031 0.003 0.150 *Temnothorax curvispinosus* 0.057 0.022 0.113 0.037 0.005 0.169 *Temnothorax longispinosus* 0.086 0.062 0.114 0.333 0.141 0.587 Table 3. Estimated probabilities of capture and occurrence (with 95% credible intervals) for ant species captured in forest habitat. Probabilities are estimated at the average value of the

Modern Methods of Estimating Biodiversity from Presence-Absence Surveys 289

covariates observed in the sample.

Fig. 2. Estimates of site-specific species richness (open circles with 95% credible intervals) for ants in forest habitat (upper panel) and bog habitat (lower panel) versus elevation. Number of species captured at each site (closed circles) is shown for comparison.

12 Will-be-set-by-IN-TECH

0 100 200 300 400 500

0 100 200 300 400 500

Fig. 2. Estimates of site-specific species richness (open circles with 95% credible intervals) for ants in forest habitat (upper panel) and bog habitat (lower panel) versus elevation. Number

● ● ● ● ● ● ●

● ● ● ● ●

● ● ● ● ● ●

●

of species captured at each site (closed circles) is shown for comparison.

Elevation (m)

● ●

●

●

● ●

●

Elevation (m)

●

●

●

●

●

● ●

●

●

Number of species

●

●

●

● ●

●

●

●

●

Number of species


Table 3. Estimated probabilities of capture and occurrence (with 95% credible intervals) for ant species captured in forest habitat. Probabilities are estimated at the average value of the covariates observed in the sample.

Beta diversity between habitats

Fig. 3. Estimates of beta diversity (open circles with 95% credible intervals) between ant

Modern Methods of Estimating Biodiversity from Presence-Absence Surveys 291

communities present in bog and forest habitats at each sample location.

0.4 0.6 0.8 1.0

Arcadia Bog (MA)

Carmi Bog (VT) Clayton Bog (MA) Chickering Bog (VT) Chockalog Bog (MA) Colchester Bog (VT) Hawley Bog (MA)

Molly Bog (VT) Moose Bog (VT) Otis Bog (MA) Peacham Bog (VT) Ponkapoag Bog (MA) Quag Bog (MA) Round Pond (MA)

Bourne−Hadley Ponds (MA)

Halls Brook Cedar Swamp (MA)

Shankpainter Ponds (MA) Snake Mountain (VT) North Springfield (VT) Swift River (MA) Tobey Pond Bog (CT) Lake Jones (MA)


Table 4. Estimated probabilities of capture and occurrence (with 95% credible intervals) for ant species captured in bog habitat. Probabilities are estimated at the average value of the covariates observed in the sample.

14 Will-be-set-by-IN-TECH

Species Median 2.5% 97.5% Median 2.5% 97.5% *Camponotus herculeanus* 0.014 0.002 0.050 0.190 0.040 0.731 *Camponotus novaeboracensis* 0.066 0.043 0.094 0.348 0.172 0.571 *Camponotus pennsylvanicus* 0.007 0.001 0.040 0.134 0.017 0.723 *Dolichoderus plagiatus* 0.015 0.002 0.073 0.105 0.016 0.515 *Dolichoderus pustulatus* 0.090 0.071 0.112 0.701 0.491 0.863 *Formica neorufibarbis* 0.007 0.001 0.040 0.126 0.015 0.691 *Formica subaenescens* 0.353 0.308 0.402 0.371 0.194 0.580 *Formica subsericea* 0.014 0.004 0.037 0.295 0.083 0.774 *Lasius alienus* 0.020 0.006 0.054 0.191 0.051 0.550 *Lasius speculiventris* 0.050 0.010 0.138 0.077 0.014 0.263 *Lasius umbratus* 0.008 0.001 0.034 0.210 0.037 0.766 *Leptothorax canadensis* 0.007 0.001 0.039 0.142 0.018 0.764 *Myrmica lobifrons* 0.559 0.529 0.589 0.916 0.748 0.984 *Myrmica punctiventris* 0.006 0.001 0.039 0.150 0.018 0.783 *Myrmica* species 1 ("AF-scu") 0.015 0.002 0.073 0.102 0.015 0.486 *Myrmica* species 2 ("AF-smi") 0.008 0.001 0.034 0.231 0.041 0.826 *Stenamma brevicorne* 0.007 0.001 0.041 0.149 0.019 0.772 *Tapinoma sessile* 0.167 0.133 0.207 0.356 0.184 0.561 *Temnothorax ambiguus* 0.007 0.001 0.042 0.127 0.017 0.697 Table 4. Estimated probabilities of capture and occurrence (with 95% credible intervals) for ant species captured in bog habitat. Probabilities are estimated at the average value of the

covariates observed in the sample.

Capture probability Occurrence probability

Fig. 3. Estimates of beta diversity (open circles with 95% credible intervals) between ant communities present in bog and forest habitats at each sample location.

density and the first two principal components of vegetation structure, and they reported significant negative associations between species density and four other covariates (LAT, LAI, GSF, and ELEV). Collectively, these six regressors accounted for 83% of the variation in species density. In the present analysis of forest data, the best-fitting model included the effects of four covariates (LAT, LAI, GSF, and ELEV), and the estimated effects of these covariates were all significantly negative, which agrees qualitatively with the regression results of Gotelli & Ellison (2002), though principal components of vegetation structure were not included in the

Modern Methods of Estimating Biodiversity from Presence-Absence Surveys 293

In comparing the results obtained using the linear regression model (Gotelli & Ellison 2002) and the hierarchical model of species occurrences and captures, we note that while both models revealed the same set of negative predictors of ant occurrence in forest habitat (Figure 1), the regression model's associations between species density of bog ants and two predictors (latitude and vegetation structure) are not supported by the hierarchical model. Part of the difference in these results may be attributed to the fact that slightly different data sets were used in the two analyses. Species detected using tuna baits, hand collections, and leaf-litter sorting (in forest habitats) were included in the regression analysis, whereas only species captured in pitfall traps were used in the present analysis. However, these differences in data are relatively minor because the alternative sampling methods used by Gotelli & Ellison (2002) added only a few rare species to their analysis. Instead, we believe the different results stem primarily from differences in the underlying assumptions of these two models. The regression model assumes (1) that the effects of environmental covariates are identical for each species and are linearly related to species density and (2) that residual errors in species density are normally distributed and do not distinguish between measurement errors and heterogeneity among species in their response to covariates. In contrast, the hierarchical model assumes that the effects of environmental covariates differ among species (Figure 1) and that occurrence probabilities and capture probabilities can be estimated separately for

each species (Tables 3 and 4) owing to the replicated sampling at each site.

The estimated probabilities of occurrence and capture of each species are of great interest in themselves and highlight differences in species compositions between ants found in bog and forest habitats. For example, the forest species with the highest occurrence probability was *Aphaenogaster rudis* (species complex) (*ψ*ˆ = 0.779). This species is taxonomically unresolved and currently includes a complex of poorly differentiated species across its geographic range (Umphrey 1996). *Myrmica punctiventris* had the second highest occurrence probability (*ψ*ˆ = 0.739). Both of these species are characteristic of forest ant assemblages in New England. *A. rudis* (species complex) was never captured in bogs and the occurrence probability of *M. punctiventris* in bogs was only 0.150, almost a fivefold difference between the two habitats. In bogs the highest occurrence probabilities were estimated for the bog specialist, *Myrmica lobifrons* (*ψ*ˆ = 0.916), and for *Dolichoderus pustulatus* (*ψ*ˆ = 0.701), a generalist species that sometimes builds carton nests in dead leaves of the carnivorous pitcher plant *Sarracenia purpurea* (A. Ellison and N. Gotelli, personal communication). Occurrence probabilities of these species in forests were only 0.299 (*M. lobifrons*) and 0.042 (*D. pustulatus*), a 3- to 16-fold difference. These pronounced differences in the occurrence probabilities of the most common species in each habitat suggest that the two habitats support distinctive ant assemblages, a conclusion also supported by the relatively high estimates of beta diversity between habitats

present analysis.

(Figure 3).

Fig. 4. Distribution of estimates of beta diversity computed for all pairwise combinations of samples collected in forest habitat (upper panel) or bog habitat (lower panel).

16 Will-be-set-by-IN-TECH

Beta diversity between sample sites

Beta diversity between sample sites

Fig. 4. Distribution of estimates of beta diversity computed for all pairwise combinations of

samples collected in forest habitat (upper panel) or bog habitat (lower panel).

0.4 0.6 0.8 1.0

0.4 0.6 0.8 1.0

Relative frequency

Relative frequency

0

1

2

3

4

0

1

2

3

4

density and the first two principal components of vegetation structure, and they reported significant negative associations between species density and four other covariates (LAT, LAI, GSF, and ELEV). Collectively, these six regressors accounted for 83% of the variation in species density. In the present analysis of forest data, the best-fitting model included the effects of four covariates (LAT, LAI, GSF, and ELEV), and the estimated effects of these covariates were all significantly negative, which agrees qualitatively with the regression results of Gotelli & Ellison (2002), though principal components of vegetation structure were not included in the present analysis.

In comparing the results obtained using the linear regression model (Gotelli & Ellison 2002) and the hierarchical model of species occurrences and captures, we note that while both models revealed the same set of negative predictors of ant occurrence in forest habitat (Figure 1), the regression model's associations between species density of bog ants and two predictors (latitude and vegetation structure) are not supported by the hierarchical model. Part of the difference in these results may be attributed to the fact that slightly different data sets were used in the two analyses. Species detected using tuna baits, hand collections, and leaf-litter sorting (in forest habitats) were included in the regression analysis, whereas only species captured in pitfall traps were used in the present analysis. However, these differences in data are relatively minor because the alternative sampling methods used by Gotelli & Ellison (2002) added only a few rare species to their analysis. Instead, we believe the different results stem primarily from differences in the underlying assumptions of these two models. The regression model assumes (1) that the effects of environmental covariates are identical for each species and are linearly related to species density and (2) that residual errors in species density are normally distributed and do not distinguish between measurement errors and heterogeneity among species in their response to covariates. In contrast, the hierarchical model assumes that the effects of environmental covariates differ among species (Figure 1) and that occurrence probabilities and capture probabilities can be estimated separately for each species (Tables 3 and 4) owing to the replicated sampling at each site.

The estimated probabilities of occurrence and capture of each species are of great interest in themselves and highlight differences in species compositions between ants found in bog and forest habitats. For example, the forest species with the highest occurrence probability was *Aphaenogaster rudis* (species complex) (*ψ*ˆ = 0.779). This species is taxonomically unresolved and currently includes a complex of poorly differentiated species across its geographic range (Umphrey 1996). *Myrmica punctiventris* had the second highest occurrence probability (*ψ*ˆ = 0.739). Both of these species are characteristic of forest ant assemblages in New England. *A. rudis* (species complex) was never captured in bogs and the occurrence probability of *M. punctiventris* in bogs was only 0.150, almost a fivefold difference between the two habitats.

In bogs the highest occurrence probabilities were estimated for the bog specialist, *Myrmica lobifrons* (*ψ*ˆ = 0.916), and for *Dolichoderus pustulatus* (*ψ*ˆ = 0.701), a generalist species that sometimes builds carton nests in dead leaves of the carnivorous pitcher plant *Sarracenia purpurea* (A. Ellison and N. Gotelli, personal communication). Occurrence probabilities of these species in forests were only 0.299 (*M. lobifrons*) and 0.042 (*D. pustulatus*), a 3- to 16-fold difference. These pronounced differences in the occurrence probabilities of the most common species in each habitat suggest that the two habitats support distinctive ant assemblages, a conclusion also supported by the relatively high estimates of beta diversity between habitats (Figure 3).

Although occurrence and capture probabilities were positively correlated among species (Figure 5), a few rare forest species (*Formica subintegra* and *Formica subsericea*) had relatively high capture probabilities. In the forest habitat the two species with the highest capture probabilities were *F. subsericea* (*p*ˆ = 0.248) and *Myrmica punctiventris* (*p*ˆ = 0.248). In bogs these species had capture probabilities of only 0.014 (*F. subsericea*) and 0.006 (*M. punctiventris*), a 17 to 41-fold difference. The two species with the highest capture probabilities in the bog habitat were *Myrmica lobifrons* (*p*ˆ = 0.559), the bog specialist, and *Formica subaenescens* (*p*ˆ = 0.353). In the forest habitat these species had capture probabilities of only 0.056 (*M. lobifrons*) and 0.051

Modern Methods of Estimating Biodiversity from Presence-Absence Surveys 295

The estimated probabilities of occurrence of most species in the forest habitat decreased with latitude (Figure 1), which is consistent with previous regression analyses of species density (Gotelli & Ellison 2002, figure 1). However, the occurrence probabilities of three species (*Camponotus herculeanus*, *Lasius alienus*, and *Myrmica detritinodis*) significantly increased with latitude. Two of these species, *C. herculeanus* and *M. detritinodis*, are boreal, cold-climate specialists (Ellison et al. 2012), whereas *L. alienus* has a more widespread distribution. Under climate change scenarios of increasing temperatures at high latitudes, species whose occurrence probabilities currently increase with latitude might disappear from New England as their ranges shift northward; other species in the assemblage might show no change in

To summarize the comparisons between our results and those reported by Gotelli & Ellison (2002), we note that within-site replication of presence-absence surveys allowed us to estimate species-specific probabilities of capture and occurrence and species-specific effects of environmental covariates. These results represent a considerable advance over traditional regression analyses of observed species density. Using a hierarchical approach to model building, we were able to infer sources of variation in measures of biodiversity – such as the effect of elevation on site-specific species richness (Figure 2) and the effect of habitat on beta diversity (Figure 3) – and to determine how these community-level patterns were related to differences in occurrence of individual species. Although many macroecological data sets collected at large spatial scales do not include within-site replicates, regional studies often use replicated sampling grids of traps or baits (Gotelli et al. 2011) that are ideal for the kind of analysis we have described. We therefore recommend that within-site replication be used in presence-absence surveys of communities, particularly when surveys are undertaken to assess

Our analysis of the ant data illustrates the benefits of using hierarchical models to estimate measures of biodiversity and other community-level characteristics. By adopting a hierarchical approach to model building, an analyst actually specifies two models: one for the ecologically relevant parameters (or state variables) that are usually of primary interest but are not directly observable, and a second model for the observed data, which are related to the ecological parameters but are influenced also by sampling methods and sampling errors. This dichotomy between models of ecological parameters and models of data is extremely useful and has been exploited to solve a variety of inference problems in ecology (Royle & Dorazio

(*F. subaenescens*), a 7- to 9-fold difference.

distribution, or might increase in occurrence.

**5.2 Benefits and challenges of hierarchical modeling**

levels of biodiversity.

2008).

Fig. 5. Estimates of species-specific capture probability versus occurrence probability for ants in forest habitat (upper panel) and bog habitat (lower panel). Note difference in scale between ordinates of upper and lower panels.

18 Will-be-set-by-IN-TECH

0.0 0.2 0.4 0.6 0.8 1.0

Occurrence probability

0.0 0.2 0.4 0.6 0.8 1.0

Occurrence probability

Fig. 5. Estimates of species-specific capture probability versus occurrence probability for ants

in forest habitat (upper panel) and bog habitat (lower panel). Note difference in scale

between ordinates of upper and lower panels.

0.00

0.0

Capture pro

b

abilit

y

 0.3

 0.6

Capture pro

b

abilit

y

 0.15

 0.30 Although occurrence and capture probabilities were positively correlated among species (Figure 5), a few rare forest species (*Formica subintegra* and *Formica subsericea*) had relatively high capture probabilities. In the forest habitat the two species with the highest capture probabilities were *F. subsericea* (*p*ˆ = 0.248) and *Myrmica punctiventris* (*p*ˆ = 0.248). In bogs these species had capture probabilities of only 0.014 (*F. subsericea*) and 0.006 (*M. punctiventris*), a 17 to 41-fold difference. The two species with the highest capture probabilities in the bog habitat were *Myrmica lobifrons* (*p*ˆ = 0.559), the bog specialist, and *Formica subaenescens* (*p*ˆ = 0.353). In the forest habitat these species had capture probabilities of only 0.056 (*M. lobifrons*) and 0.051 (*F. subaenescens*), a 7- to 9-fold difference.

The estimated probabilities of occurrence of most species in the forest habitat decreased with latitude (Figure 1), which is consistent with previous regression analyses of species density (Gotelli & Ellison 2002, figure 1). However, the occurrence probabilities of three species (*Camponotus herculeanus*, *Lasius alienus*, and *Myrmica detritinodis*) significantly increased with latitude. Two of these species, *C. herculeanus* and *M. detritinodis*, are boreal, cold-climate specialists (Ellison et al. 2012), whereas *L. alienus* has a more widespread distribution. Under climate change scenarios of increasing temperatures at high latitudes, species whose occurrence probabilities currently increase with latitude might disappear from New England as their ranges shift northward; other species in the assemblage might show no change in distribution, or might increase in occurrence.

To summarize the comparisons between our results and those reported by Gotelli & Ellison (2002), we note that within-site replication of presence-absence surveys allowed us to estimate species-specific probabilities of capture and occurrence and species-specific effects of environmental covariates. These results represent a considerable advance over traditional regression analyses of observed species density. Using a hierarchical approach to model building, we were able to infer sources of variation in measures of biodiversity – such as the effect of elevation on site-specific species richness (Figure 2) and the effect of habitat on beta diversity (Figure 3) – and to determine how these community-level patterns were related to differences in occurrence of individual species. Although many macroecological data sets collected at large spatial scales do not include within-site replicates, regional studies often use replicated sampling grids of traps or baits (Gotelli et al. 2011) that are ideal for the kind of analysis we have described. We therefore recommend that within-site replication be used in presence-absence surveys of communities, particularly when surveys are undertaken to assess levels of biodiversity.

#### **5.2 Benefits and challenges of hierarchical modeling**

Our analysis of the ant data illustrates the benefits of using hierarchical models to estimate measures of biodiversity and other community-level characteristics. By adopting a hierarchical approach to model building, an analyst actually specifies two models: one for the ecologically relevant parameters (or state variables) that are usually of primary interest but are not directly observable, and a second model for the observed data, which are related to the ecological parameters but are influenced also by sampling methods and sampling errors. This dichotomy between models of ecological parameters and models of data is extremely useful and has been exploited to solve a variety of inference problems in ecology (Royle & Dorazio 2008).

reduce these problems, Royle & Link (2006) recommended that the model's parameters be constrained to ensure that estimates of misclassification probabilities are lower than estimates of detection probabilities. This constraint, though sensible, does not provide a solution when the probabilities of misclassification and detection are nearly equal (McClintock et al. 2010b, Royle & Link 2006). The development of statistical models of species occurrence that include both false-positive and false-negative errors in detection, as well as unobserved sources of heterogeneity in both occurrence and detection probabilities, is an active area of research

Modern Methods of Estimating Biodiversity from Presence-Absence Surveys 297

The conceptual framework described in this paper is broadly applicable in ecological research and in assessments of biodiversity. Hierarchical, statistical models of multispecies, presence-absence data can be used to estimate current levels of biodiversity, as illustrated in our analysis of the ant data, or to assess changes (e.g., trends) in communities over time (Dorazio et al. 2010, Kéry, Dorazio, Soldaat, van Strien, Zuiderwijk & Royle 2009, Russell et al. 2009, Walls et al. 2011). The models of community change are especially relevant in ecological research because they provide an analytical framework wherein data may be used to confront alternative theories of metacommunity dynamics (Holyoak & Mata 2008, Leibold et al. 2004). Although a few classes of statistical models have been developed to infer patterns of co-occurrence among species (MacKenzie et al. 2004, Waddle et al. 2010), models for estimating the dynamics of interacting species (e.g., competitors or predators) from replicated, presence-absence data have not yet been formulated. Such models obviously

Collection of the original ant dataset was supported by NSF grants 98-05722 and 98-08504 to AME and NJG, respectively, and by contract MAHERSW99-17 from the Massachusetts Natural Heritage and Endangered Species Program to AME. Additional support for AME's and NJG's research on the distribution of ants in response to climatic change is provided by the U.S. Department of Energy through award DE-FG02-08ER64510. The statistical modeling and analysis was conducted as a part of the Binary Matrices Working Group at the National Institute for Mathematical and Biological Synthesis, sponsored by the National Science Foundation, the U.S. Department of Homeland Security, and the U.S. Department of Agriculture through NSF Award #EF-0832858, with additional support from The University

Any use of trade, product, or firm names is for descriptive purposes only and does not imply

Here we describe methods for fitting our hierarchical model using the Markov chain Monte Carlo (MCMC) algorithms implemented in the software package, JAGS (Just Another Gibbs Sampler), which is freely available at the following web site: http://mcmc-jags. sourceforge.net. This software allows the user to specify a model in terms of its underlying assumptions, which include the distributions assumed for the observed data and the model's parameters. The latter distributions include priors, which are needed, of course,

owing to the difficulties associated with aural detection methods.

represent an important area of future research.

**6. Acknowledgments**

of Tennessee, Knoxville.

endorsement by the U.S. Government.

**7. Appendix: Technical details**

**7.1 Model fitting and software**

In our hierarchical model of replicated, presence-absence surveys, the parameter of primary ecological interest is the community's incidence matrix. This matrix is only partially observable because a species may be present at a sample location but not observed in the surveys. We use a binomial sampling model to specify the probability of detection (or capture) of each species and thereby to account for detection errors in the observed data. In this way estimates of the community's incidence matrix are automatically adjusted for the imperfect detectability of each species.

In our approach, measures of biodiversity are estimated indirectly as functions of the estimated incidence matrix of the community. Thus, species richness and measures of alpha or beta diversity depend on a set of model-based estimates of species- and site-specific occurrences. This approach differs considerably with classes of statistical models wherein species richness is treated as a single random variable – usually a discrete random variable – that represents the aggregate contribution of all species in the community. This "top-down" view of a community may yield incorrect inferences if heterogeneity in detectability exists among species or if the effects of environmental covariates on occurrence differ among species, as illustrated in our analysis of the ant data.

The inferential benefits of using hierarchical models to estimate measures of biodiversity are not free. As described earlier, the price to be paid for the ability to estimate probabilities of species occurrence and species detection is replication of presence-absence surveys within sample locations. In our opinion the improved understanding acquired in modeling the community at the level of individual species and the versatility attained by having accurate estimates of a community's incidence matrix far outweigh the cost of additional sampling. That said, there are other, perhaps less obvious, costs associated with these hierarchical models. Specifically, estimates of species richness and other community-level parameters may be sensitive to the underlying assumptions of these models, and these assumptions can be difficult to test using standard goodness-of-fit procedures. For example, the choice of distributions for modeling heterogeneity among species or sites may exert some influence on estimates of species richness. We assumed a bivariate normal distribution for the distribution of logit-scale, mean probabilities of occurrence and detection, but other distributions – even multimodal distributions – also might be useful. In single-species models of replicated, presence-absence surveys, estimates of occurrence are sensitive to the distribution used to specify heterogeneity in detection probabilities among sample sites (Dorazio 2007, Royle 2006); therefore, similar sensitivity can be expected in multispecies models, though this aspect of model adequacy has not been rigorously explored.

Another assumption of our model that is difficult to test is absence of false-positive errors in detection. In other words, if a species is detected (or captured), we assume that its identify is known with certainty. However, in surveys of avian or amphibian communities where species are detected by their vocalizations, misidentifications of species can and do occur (McClintock et al. 2010a,b, Simons et al. 2007). These misidentifications are even more common in circumstances where surveys are conducted by volunteers whose identification skills are highly variable (Genet & Sargent 2003). If ignored, false-positive errors in detection induce a positive bias in estimates of species occurrence because species are incorrectly "detected" at sites where they are absent. While it is possible to construct statistical models of presence-absence data that include parameters for both false-positive and false-negative detection errors (Royle & Link 2006), these models are prone to identifiability problems. To reduce these problems, Royle & Link (2006) recommended that the model's parameters be constrained to ensure that estimates of misclassification probabilities are lower than estimates of detection probabilities. This constraint, though sensible, does not provide a solution when the probabilities of misclassification and detection are nearly equal (McClintock et al. 2010b, Royle & Link 2006). The development of statistical models of species occurrence that include both false-positive and false-negative errors in detection, as well as unobserved sources of heterogeneity in both occurrence and detection probabilities, is an active area of research owing to the difficulties associated with aural detection methods.

The conceptual framework described in this paper is broadly applicable in ecological research and in assessments of biodiversity. Hierarchical, statistical models of multispecies, presence-absence data can be used to estimate current levels of biodiversity, as illustrated in our analysis of the ant data, or to assess changes (e.g., trends) in communities over time (Dorazio et al. 2010, Kéry, Dorazio, Soldaat, van Strien, Zuiderwijk & Royle 2009, Russell et al. 2009, Walls et al. 2011). The models of community change are especially relevant in ecological research because they provide an analytical framework wherein data may be used to confront alternative theories of metacommunity dynamics (Holyoak & Mata 2008, Leibold et al. 2004). Although a few classes of statistical models have been developed to infer patterns of co-occurrence among species (MacKenzie et al. 2004, Waddle et al. 2010), models for estimating the dynamics of interacting species (e.g., competitors or predators) from replicated, presence-absence data have not yet been formulated. Such models obviously represent an important area of future research.
