**2. Statistical models for mouth-level caries data**

Mouth-level data, resulting from the DMF index, are typically analyzed as unbounded or bounded counts. For unbounded counts, a Poisson regression model or its extension the negative binomial regression model that accounts for overdispersion in the data, are often used. A binomial regression model for bounded counts is often advocated.

For unbounded counts, these models assume that the basic underlying distribution for the data is either a Poisson or a negative binomial distribution. The Poisson model is the simplest distribution for nonnegative discrete data, and is entirely specified by a positive

developed in the 1930s by Klein *et al.* (see for example Klein and Palmer, 1938). This index is applied to all the teeth (DMFT) or to all surfaces (DMFS), and represents the cumulative severity of dental caries experience for each individual. These scores have well documented shortcomings regarding their ability to describe the intra-oral distribution of dental caries (Lewsey and Thomson, 2004). But they continue to be instrumental in evaluating and comparing the risks of dental caries across population groups. Most importantly, they remain popular in dental caries research for their ability to conduct historical comparisons

Statistical analysis of dental caries data relies heavily on the research question under study. These questions can be classified into two groups. The first group represents questions that can be answered using mouth-level outcomes generated using aggregated scores such as the DMF index. The second group refers to questions that necessitate the use of tooth or toothsurface level outcomes. A very important issue to address for the data analyst is the modeling strategy to adopt for the response variable under investigation. Broadly, two fairly different views are advocated. The first view, supported by large-sample properties, states that normal theory should be applied as much as possible, even to non-normal data such as counts (Verbeke and Molenberghs, 2000). This view is strengthened by the notion that, normal models, despite being a member of the generalized linear models (GLIM), are much further developed than any other GLIM (e.g. model checks and diagnostic tools), and that they enjoy unique properties (e.g., the existence of closed form solutions, exact distributions for test statistics, unbiased estimators, etc...). Although this is correct in principle, it fails to acknowledge that normal models may not be adequate for some types of data. As an example, the abundance of zeros in DMF scores rules out any attempt to use normal models, such as linear models, even after a suitable transformation. While a transformation may normalize the distribution of nonzero response values, no transformation could spread the zeros (Hall, 2000). A different modeling view is that each type of outcome should be analyzed using tools that exploit the nature of the data. For dental data, features to be accommodated include the discrete nature of the data (count responses for mouth-level data and binary response for intra-oral data), the abundance of zeros for example in the DMF/S scoring, and the clustering in intra-oral responses. The clustering of participants as a result

This chapter reviews common statistical parametric models to answer questions that arise in dental caries research, with an eye to discerning their relative strengths and limitations. Missing data problems arising in caries dental reasrch will also be discussed but touched on

Mouth-level data, resulting from the DMF index, are typically analyzed as unbounded or bounded counts. For unbounded counts, a Poisson regression model or its extension the negative binomial regression model that accounts for overdispersion in the data, are often

For unbounded counts, these models assume that the basic underlying distribution for the data is either a Poisson or a negative binomial distribution. The Poisson model is the simplest distribution for nonnegative discrete data, and is entirely specified by a positive

used. A binomial regression model for bounded counts is often advocated.

in population-based studies.

of the study design is another important feature.

**2. Statistical models for mouth-level caries data** 

briefly.

parameter the mean. This mean is often related to potential explanatory variables using a log link function. Specifically, let Y define the outcome variable and X the set of explanatory variables. A Poisson regression model for the mean is defined as E�Y|X� = e����, where α and β are the intercept and the regression parameter vector associated with X. The probability mass function of Y is given by: P�Y=y|X� <sup>=</sup> ������� ����� , y=0,1,…, where μx = E(Y|X) is the conditional mean which depends on covariates.

One major restriction of the Poisson regression model is that its mean is equal to its variance. For dental caries data, however, it is not uncommon for the variance to be much greater than the mean. For such data, a negative binomial regression model has been advocated as an alternative to Poisson regression models. It is typically used when the variability in the data cannot be properly captured by Poisson regression models. The negative binomial model is a conjugate mixture distribution for count data (Agresti, 2002). It is entirely specified by two parameters, its mean and the overdispersion parameter. Similarly to the Poisson regression model, the mean is related to potential explanatory variables using a log link function. However, the probability mass function of Y is given by:

$$\mathbb{P}(\mathbf{Y} = \mathbf{y} | \mathbf{X}) = \frac{\Gamma(\mathbf{y} + \kappa^{-1})}{\Gamma(\kappa^{-1})\Gamma(\mathbf{y} + 1)} \left(\frac{\kappa^{-1}}{\mu\_{\mathbf{x}} + \kappa^{-1}}\right)^{\kappa^{-1}} \left(1 - \frac{\kappa^{-1}}{\mu\_{\mathbf{x}} + \kappa^{-1}}\right)^{\mathbf{y}}, \qquad \mathbf{y} = \mathbf{0}, \mathbf{1}, \dots$$

where μx = E�Y|X� = e���� is the conditional mean which depends on covariates, and κ is the overdispersion parameter. This distribution has variance μ� + κμ� �. Parameter κ is typically unknown and estimated from data to evaluate the extent of overdispersion in the data. When κ tends to zero, the negative binomial model converges to a Poisson process (Agresti, 2002).

The presence of an upper bound for possible values taken by DMF scores suggests a model based on the binomial rather than the Poisson distribution (Hall, 2000). Data are then viewed as being generated from a binomial process with m trials and success probability π�. Here m represents the maximum number of teeth or tooth surfaces in the mouth susceptible to decay, and π� the probability for a tooth or tooth surface to present a sign of decay. The binomial model is given by:

$$\Pr(\mathbf{Y} = \mathbf{y} | \mathbf{X}) = \frac{\Gamma(\mathbf{m} + 1)}{\Gamma(\mathbf{m} - \mathbf{y} + 1)\Gamma(\mathbf{y} + 1)} \quad \text{( $\pi\_\mathbf{X}$ )}^\mathbf{y} \ (1 - \pi\_\mathbf{X})^{\mathbf{m} - \mathbf{y}}, \mathbf{y} = \mathbf{0}, \mathbf{1}, \dots \mathbf{m}\_\mathbf{y}$$

where the success probability is related to covariates as �� <sup>=</sup> ����� �������, with α and β being the intercept and the regression parameter vector associated with X. One should note however that Poisson and negative binomial distributions provide a reasonable approximation to the binomial distribution in dental caries research.

Dental caries data with excess zeros are common in statistical practice. For example, in young children, DMF scores generally generate an excessive number of zeros in that many children do not experience dental caries. This is typically due to a short exposure time to caries development. The limitations of Poisson and negative binomial regression models to analyze such data are well established (see, for example, Lambert, 1992; and Hall, 2000). One approach to analyze count data with many zeros is to use zero-inflated models. This class of

Statistical Models for Dental Caries Data 97

As a basic starting model, a homogeneous Poisson regression model is fit and compared to a homogeneous Negative Binomial model. In view of the AIC, the homogeneous Negative Binomial model provides a reasonably good fit compared to the Poisson model. This result is consistent with overdispersion parameter � in the homogeneous Negative Binomial model being statistically significant at 5%, suggesting that overdispersion cannot be ignored in these data. As a result, the standard errors of parameter estimates in the mean model under the homogeneous Negative Binomial model are larger compared to those of the homogeneous Poisson model. The homogeneous Negative Binomial model is further compared to a zeroinflated Negative Binomial model which potentially accommodates extra zeros in the data. In the latter model, the mixing weight � is related to covariates as, ω = �1+e������, where � = ����, ��, ��� � ��� and � = ���, ��, ��, ���′. In view of the AIC, this model provides a better representation of the data compared to the homogeneous Negative Binomial model. This is consistent with findings from the literature dental caries in young children typically

The zero-inflated regression models provide an interesting parametric framework to accommodate heterogeneity in a population. A prevailing concern, however, is that these models only accommodate an inflation of zeros in the population. Inflation and deflation at zero often arise in various practical applications. Homogeneous models (Poisson and negative binomial regression models) when applied to data from the Detroit study typically reveal an inflation of zeros (few children with no dental caries predicted than observed) for younger children and deflation of zeros (more children with no dental caries predicted than observed) for older children. For such data, a model that captures only inflation of zeros may fail to properly represent heterogeneity in the population. This then necessitates the use of models that can accommodate both inflation and deflation in the population. A good example of such models is the two-stage model also known as the Hurdle model (Mullahy, 1986). An alternative approach is to use the marginal distribution derived from the mixture

P�Y=y|X� = �ω + �1−ω�P��Y=y|X�, y = 0 

Finally, the models described above are basic starting models and should be extended to accommodate unique features of the data under consideration. For example, it is often the case that the sampling design used to recruit study participants leads to clustered data. In survey research, sampled subjects living in the same neighborhood are more likely to share common, typically unmeasured, predispositions or characteristics that lead to dependent data. This therefore necessitates the use of models for clustered or correlated data. An example of such models is described by Todem et al. (2010) for the analysis of dental caries for low-income African American children under the age of six living in the city of Detroit. These authors extended the family of Poisson and negative binomial models to derive the

�1−ω�P��Y=y|X�, � � 0,

��������|�� ≤�≤1. Note here that the constraints on the mixing weights are obtained only by imposing that, 0≤P�Y=y|X� ≤ 1 for all y. The mixing weight is potential negative to accommodate deflation in the data. For this class of models, the marginal mixture model maintains his hierarchical representation only if the mixing weight are bounded between 0 and 1. When the mixing weight is negative, the marginal mixture model then loses its

exhibit overdispersion in addition to zero-inflation (Bohning et al, 1999).

distribution:

where �������|��

hierarchical representation.

models views the data as being generated from P�Y=y|X� a mixture of a zero point mass and a non-degenerate homogenous discrete distribution P��Y=y|X� as follows:

$$\mathbf{P(Y=y|X)} = \begin{cases} \boldsymbol{\omega} + (1 - \boldsymbol{\omega})\mathbf{P\_1(Y=y|X)}, & \mathbf{y=0} \\ (1 - \boldsymbol{\omega})\mathbf{P\_1(Y=y|X)}, & \mathbf{y>0} \end{cases}$$

where 0���1 represents the mixing probability that captures the heterogeneity of zeros in the population. The choice of the homogenous distribution P��Y=y|X� for the most part depends on the nature of counts under consideration. For bounded counts, a binomial distribution is typically used (Hall, 2000). Poisson and negative binomial distributions are the standard for unbounded counts (Bohning et al., 1999). Ridout, Demetrio and Hinde (1998) provide an extensive review of this literature. In real applications of these models in dental caries research, the mixing probability is often related to covariates using for example a logistic model.

We illustrate below how some of these simple models can be applied to dental caries scores data generated from a survey designed to collect oral health information on low-income African American children (0-5 years), living in the city of Detroit (see Tellez et al., 2006). This study aimed at promoting oral health and reducing its disparities within this community through the understanding of determinants of dental caries. Dental caries were measured using DMF scores which represent the cumulative severity of the disease for each surveyed participants. Possible covariates include the study participant's age (AGE) and his/her sugar intake (SI). In Table 1, we present the fitted regression models applied to children's data. For these data, the mean structure of the homogeneous model is specified as E�Y|X� = e�+��, where X = ���E, ��, ��� � ���, with ��� � �� being a multiplicative interaction, and β = � β�, β�, β��� . Parameter � of the Negative Binomial model captures overdispersion in data.


\*: p-value<0.05

Table 1. Parameter estimates and (Standard errors) from a homogeneous Poisson model, a homogeneous Negative Binomial model, and a zero-inflated Negative Binomial model with covariate dependent mixing weights applied to DMF scores

models views the data as being generated from P�Y=y|X� a mixture of a zero point mass

P�Y=y|X� = �ω + �1−ω�P��Y=y|X�, y = 0 

where 0���1 represents the mixing probability that captures the heterogeneity of zeros in the population. The choice of the homogenous distribution P��Y=y|X� for the most part depends on the nature of counts under consideration. For bounded counts, a binomial distribution is typically used (Hall, 2000). Poisson and negative binomial distributions are the standard for unbounded counts (Bohning et al., 1999). Ridout, Demetrio and Hinde (1998) provide an extensive review of this literature. In real applications of these models in dental caries research, the mixing probability is often related to covariates using for example

We illustrate below how some of these simple models can be applied to dental caries scores data generated from a survey designed to collect oral health information on low-income African American children (0-5 years), living in the city of Detroit (see Tellez et al., 2006). This study aimed at promoting oral health and reducing its disparities within this community through the understanding of determinants of dental caries. Dental caries were measured using DMF scores which represent the cumulative severity of the disease for each surveyed participants. Possible covariates include the study participant's age (AGE) and his/her sugar intake (SI). In Table 1, we present the fitted regression models applied to children's data. For these data, the mean structure of the homogeneous model is specified as E�Y|X� = e�+��, where X = ���E, ��, ��� � ���, with ��� � �� being a multiplicative

Binomial

� 1.3994(0.0209)\* 1.3484(0.0725)\* 2.0158(0.0676)\* �� 0.6981(0.0193)\* 0.9188(0.0861)\* 0.2350(0.0679)\* �� 0.2696(0.0203)\* 0.2378(0.0853)\* 0.0573(0.0695) �� -0.2790(0.0219)\* -0.3314(0.0877)\* -0.0728(0.0739) �� - - -0.6131(0.1595)\* �� - - -1.7191(0.2276)\* �� - - -0.2226(0.1509) �� - - 0.3163(0.2022) � - 2.6178(0.1753)\* 0.9295(0.1058)\*


Table 1. Parameter estimates and (Standard errors) from a homogeneous Poisson model, a homogeneous Negative Binomial model, and a zero-inflated Negative Binomial model with

�1−ω�P��Y=y|X�, � � 0,

. Parameter � of the Negative Binomial model captures

Zero-inflated Negative Binomial (mixing weight depends on covariates)

and a non-degenerate homogenous discrete distribution P��Y=y|X� as follows:

a logistic model.

interaction, and β = � β�, β�, β���

Parameter Homogeneous Poisson Homogeneous Negative

covariate dependent mixing weights applied to DMF scores

overdispersion in data.

\*: p-value<0.05

As a basic starting model, a homogeneous Poisson regression model is fit and compared to a homogeneous Negative Binomial model. In view of the AIC, the homogeneous Negative Binomial model provides a reasonably good fit compared to the Poisson model. This result is consistent with overdispersion parameter � in the homogeneous Negative Binomial model being statistically significant at 5%, suggesting that overdispersion cannot be ignored in these data. As a result, the standard errors of parameter estimates in the mean model under the homogeneous Negative Binomial model are larger compared to those of the homogeneous Poisson model. The homogeneous Negative Binomial model is further compared to a zeroinflated Negative Binomial model which potentially accommodates extra zeros in the data. In the latter model, the mixing weight � is related to covariates as, ω = �1+e������, where � = ����, ��, ��� � ��� and � = ���, ��, ��, ���′. In view of the AIC, this model provides a better representation of the data compared to the homogeneous Negative Binomial model. This is consistent with findings from the literature dental caries in young children typically exhibit overdispersion in addition to zero-inflation (Bohning et al, 1999).

The zero-inflated regression models provide an interesting parametric framework to accommodate heterogeneity in a population. A prevailing concern, however, is that these models only accommodate an inflation of zeros in the population. Inflation and deflation at zero often arise in various practical applications. Homogeneous models (Poisson and negative binomial regression models) when applied to data from the Detroit study typically reveal an inflation of zeros (few children with no dental caries predicted than observed) for younger children and deflation of zeros (more children with no dental caries predicted than observed) for older children. For such data, a model that captures only inflation of zeros may fail to properly represent heterogeneity in the population. This then necessitates the use of models that can accommodate both inflation and deflation in the population. A good example of such models is the two-stage model also known as the Hurdle model (Mullahy, 1986). An alternative approach is to use the marginal distribution derived from the mixture distribution:

$$\mathbf{P(Y=y|X)} = \begin{cases} \boldsymbol{\omega} + (1-\boldsymbol{\omega})\mathbf{P\_1(Y=y|X)}, & \mathbf{y=0} \\ (1-\boldsymbol{\omega})\mathbf{P\_1(Y=y|X)}, & \mathbf{y>0}, \end{cases}$$

where �������|�� ��������|�� ≤�≤1. Note here that the constraints on the mixing weights are obtained only by imposing that, 0≤P�Y=y|X� ≤ 1 for all y. The mixing weight is potential negative to accommodate deflation in the data. For this class of models, the marginal mixture model maintains his hierarchical representation only if the mixing weight are bounded between 0 and 1. When the mixing weight is negative, the marginal mixture model then loses its hierarchical representation.

Finally, the models described above are basic starting models and should be extended to accommodate unique features of the data under consideration. For example, it is often the case that the sampling design used to recruit study participants leads to clustered data. In survey research, sampled subjects living in the same neighborhood are more likely to share common, typically unmeasured, predispositions or characteristics that lead to dependent data. This therefore necessitates the use of models for clustered or correlated data. An example of such models is described by Todem et al. (2010) for the analysis of dental caries for low-income African American children under the age of six living in the city of Detroit. These authors extended the family of Poisson and negative binomial models to derive the

Statistical Models for Dental Caries Data 99

under the conditional independence assumption. In dental caries research, data collected at the tooth level or tooth-surface level are typically binary outcomes representing the presence or absence of decay. For such data, a logistic regression model with random effects is typically used. In this class of models, fixed-effects regression parameters have a subjectspecific interpretation, conditional on random effects (Verbeke and Molenberghs, 2000). That is, they have a direct and meaningful interpretation only for covariates that change within the cluster level (subject's mouth) such as the location of a tooth or a tooth-surface in the mouth. The probabilities of tooth and tooth-surface decay are conditional given random effects and can be used to capture changes occurring within a particular subject's mouth. To assess changes across all subjects' mouths, the modeler is then required to integrate out the random effects from the quantities of interest. Generalized linear mixed effects models are likelihood-based and therefore can be highly sensitive to any distribution misspecification. But they are known to be robust against less restrictive missing data mechanisms (Little and

Although there are a variety of standard likelihood-based models available to analyze data when the outcome is approximately normal, models for discrete outcomes (such as binary outcomes) generally require a different methodology. Kung-Yee Liang and Scott Zeger (1986) have proposed the so-called Generalized Estimating Equations-GEE model, which is an extension of generalized linear models to correlated data. The basic idea of this family of models is to specify a function that links the linear predictor to the mean response, and use a set of estimating functions with any working correlation model for parameter estimation. A sandwich estimator that corrects for any misspecification of the working correlation model is then used to compute the parameters' standard errors. GEE-based models are very popular as an all-round technique to analyze correlated data when the exact likelihood is difficult to specify. One of the strong points of this methodology is that the full joint distribution of the data does not need to be fully specified to guarantee asymptotically consistent and normal parameter estimates. Instead, a working correlation model between the clustered observations is required for estimation. GEE regression parameter estimates have a population-averaged interpretation, analogous to those obtained from a crosssectional data analysis. This property makes GEE-based models desirable in populationbased studies, where the focus is on average affects accounting for the within-subject

The GEE approach has several advantages over a likelihood-based model. It is computationally tractable in applications where the parametric approaches are computationally very demanding, if not impossible. It is also less sensitive to distribution misspecification as compared to full likelihood-based models. A major limitation of GEEbased models at least in their 1986 original formulation is that they require a more stringent

As the search for effective measures for the prevention and treatment of dental caries continues, it is essential that we have effective, robust and rigorous statistical methods to help our understanding of the condition. This chapter has reviewed common statistical

Rubin, 1987).

ii. Generalized estimating equations models

association viewed as a nuisance term.

**4. Conclusion** 

missing data mechanism to produce valid inferences.

joint distribution of clustered counted outcomes with extra zeros. Two random effects models were formulated. The first model assumed a shared random effects term between the logistic model of the conditional probability of perfect zeros and the conditional mean of the imperfect state. The second formulation relaxed the shared random effects assumption by relating the conditional probability of perfect zeros and the conditional mean of the imperfect state to two correlated random effects variables. Under the conditional independence assumption and the missing data at random assumption, a direct optimization of the marginal likelihood and an EM algorithm were proposed to fit the proposed models.
