**4. Conclusion**

As the search for effective measures for the prevention and treatment of dental caries continues, it is essential that we have effective, robust and rigorous statistical methods to help our understanding of the condition. This chapter has reviewed common statistical

Statistical Models for Dental Caries Data 101

Well established parametric models for dental caries data can be fit with most common statistical software including but not limited to SAS, Splus, R and SPSS. Options are however limited for newly developed models that have emerged in the literature. For recent statistical models in dental caries research to be accepted and used widely, there should be reliable and user-friendly software, readily available to perform regression analysis routinely. The software should be time-efficient, well-documented and most importantly should have a friendly interface, features that are of course closely related to the requirement of being user-friendly. Once these regression models are implemented, this will

**Keywords:** Generalized estimating equation models, Generalized linear mixed effects models, Negative Binomial models, Poisson models, Zero-inflated models for count data

Agresti, A. (2002), *Categorical Data Analysis*, Second Edition, New York: John Wiley & Sons. Breslow, N. E. and Clayton, D. G. Approximate inference in generalized linear mixed

Bohning, D., Dietz, E., Schlattmann, P., Mendonca, L. and Kirchner, U. (1999). The zero-

Diggle, P. & Kenward, M. G. (1994). Informative dropout in longitudinal data analysis (with

Hall, D. B. (2000). Zero-inflated Poisson and binomial regression with random effects: A case

Klein H, Palmer C (1938). Studies on dental caries vs. familial resemblance in the caries

Lambert, D. (1992). Zero-inflated Poisson regression, with an application to defects in

Leroux, B. (2006). Analysis of correlated dental data: challenges and recent developments.

Liang, K. Y. & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear

Little, R. and Rubin, D. (1987). *Statistical analysis with missing data*. New York: John Wiley

Lewsey, J. D. and Thomson, W. M. (2004), The utility of the zero-inflated Poisson and zero-

Molenberghs, G., Kenward, M. G. & Lesaffre, E. (1997). The analysis of longitudinal ordinal

Mullahy, J. 1986. Specification and testing of some modified count data models. Journal of

Ridout, M., Hinde, J. and Demetrio, C. G. B. (2001). A score test for testing a zero-inflated

inflated negative binomial models: a case study of cross-sectional and longitudinal DMF data examining the effect of socio-economic status, Community Dentistry and

Poisson regression model against zero-inflated negative binomial alternatives.

inflated poisson model and the decayed, missing and filled teeth index in dental epidemiology. *Journal of the Royal Statistical Society: Series A (Statistics in Society)* 162,

help answer both mouth-level and questions in population-based research.

models. *J. Amer. Statist. Assoc.*, 88:9–25, 1993.

discussion). *Applied Statistics* 43, 49–93.

manufacturing. *Technometrics* 34, 1–14.

experience of siblings. *Pub Hlth Rep,* 53:1353-1364.

*Statistical Methods for Oral Health Research, JSM 2006.* 

data with non-random dropout. *Biometrika* 84, 33–44.

study. *Biometrics,* 56, 1030–1039.

models. *Biometrika* 73, 13–22.

Oral Epidemiology, 32:183—189

Econometrics 3: 341–365.

*Biometrics* 57, 219–223.

**5. References** 

195–209.

and Sons.

models to answer questions involving intra-oral and mouth-level outcomes, with an eye to discerning their relative strengths and limitations. Models for mouth-level data such as the DMF scores are basically count regression techniques. These models are often extended to two-component distributions when there are excess zeros. This class of models views the data as being generated from a mixture of a zero point mass and a non degenerate discrete distribution. Models for intra-oral outcomes are primarily correlated models for binary data, such as generalized linear mixed effects models and generalized estimating equations models. These models can account for the multilevel data structure (e.g., teeth within a quadrant and quadrants within the mouth) which generate a very complex and unique correlation structure (Zhang et al., 2010). Despite the relative merits of these models to account for the correlation structure, they need to be adapted to accommodate other unique features of intra-oral caries data. Intra-oral data present a unique set of challenges to statistical analysis which includes, but are not limited to, large cluster sizes and informative cluster sizes (Leroux *et al.*, 2006). More generally, models for intra-oral and mouth-level outcomes need to be adapted to the study design. For example, when the study design involves a longitudinal component, the model needs to be adapted accordingly.

Another important issue that needs to be accounted for is that of missing data. This problem is commonly encountered throughout statistical work and is almost ever present in the analysis of dental caries data. Incomplete data can have a dramatic impact on inferences if they are not properly investigated. Using terminology from Little and Rubin (1987), missing data mechanisms are classified as missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR), if missingness is allowed to depend (1) none of the outcomes, (2) the observed outcomes only, or (3) unobserved outcomes as well, respectively. GEE-based models at least in their 1986 original formulation require the more stringent MCAR mechanism to produce valid inferences. Weighted GEE-based models have been proposed to accommodate a less stringent missing data mechanism, the missing data at random process (Robins et al., 1995). Likelihood-based models such as generalized linear mixed effects models are known to be robust against the less restrictive MAR mechanism. When the missingness mechanism depends on the unobserved outcomes, these two classes of regression models are likely to produce biased inferences. For example, missing dental caries data generated from missing teeth are likely to be informative in that a missing tooth may be an indication of the severity of the decay for that particular tooth prior to the loss. For such data, ignoring missing data may lead to biased inferences. When a MNAR mechanism is suspected, a model that incorporates both the information from the outcome process and the missing data process into a unified estimating function was advocated (Diggle and Kenward, 1994 and Molenberghs et al. 1997). Such an approach has provoked a large debate about the role for such models in understanding the true data generating mechanism. The original enthusiasm was followed by skepticism about the strong and untestable assumptions on which this type of models rests (Verbeke et al., 2001). Specifically, joint models for the outcomes and missing data are typically not identifiable from observed data at hand. One then has to impose quantitative restrictions to recover identifiability. Conventional restrictions result from considering a minimal set of parameters, called sensitivity parameters, conditional upon which the remaining parameters are assumed identifiable. This method therefore produces a range of models which forms the basis of sensitivity analysis (Vach And Blettner, 1995).

Well established parametric models for dental caries data can be fit with most common statistical software including but not limited to SAS, Splus, R and SPSS. Options are however limited for newly developed models that have emerged in the literature. For recent statistical models in dental caries research to be accepted and used widely, there should be reliable and user-friendly software, readily available to perform regression analysis routinely. The software should be time-efficient, well-documented and most importantly should have a friendly interface, features that are of course closely related to the requirement of being user-friendly. Once these regression models are implemented, this will help answer both mouth-level and questions in population-based research.

**Keywords:** Generalized estimating equation models, Generalized linear mixed effects models, Negative Binomial models, Poisson models, Zero-inflated models for count data

#### **5. References**

100 Contemporary Approach to Dental Caries

models to answer questions involving intra-oral and mouth-level outcomes, with an eye to discerning their relative strengths and limitations. Models for mouth-level data such as the DMF scores are basically count regression techniques. These models are often extended to two-component distributions when there are excess zeros. This class of models views the data as being generated from a mixture of a zero point mass and a non degenerate discrete distribution. Models for intra-oral outcomes are primarily correlated models for binary data, such as generalized linear mixed effects models and generalized estimating equations models. These models can account for the multilevel data structure (e.g., teeth within a quadrant and quadrants within the mouth) which generate a very complex and unique correlation structure (Zhang et al., 2010). Despite the relative merits of these models to account for the correlation structure, they need to be adapted to accommodate other unique features of intra-oral caries data. Intra-oral data present a unique set of challenges to statistical analysis which includes, but are not limited to, large cluster sizes and informative cluster sizes (Leroux *et al.*, 2006). More generally, models for intra-oral and mouth-level outcomes need to be adapted to the study design. For example, when the study design

involves a longitudinal component, the model needs to be adapted accordingly.

the basis of sensitivity analysis (Vach And Blettner, 1995).

Another important issue that needs to be accounted for is that of missing data. This problem is commonly encountered throughout statistical work and is almost ever present in the analysis of dental caries data. Incomplete data can have a dramatic impact on inferences if they are not properly investigated. Using terminology from Little and Rubin (1987), missing data mechanisms are classified as missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR), if missingness is allowed to depend (1) none of the outcomes, (2) the observed outcomes only, or (3) unobserved outcomes as well, respectively. GEE-based models at least in their 1986 original formulation require the more stringent MCAR mechanism to produce valid inferences. Weighted GEE-based models have been proposed to accommodate a less stringent missing data mechanism, the missing data at random process (Robins et al., 1995). Likelihood-based models such as generalized linear mixed effects models are known to be robust against the less restrictive MAR mechanism. When the missingness mechanism depends on the unobserved outcomes, these two classes of regression models are likely to produce biased inferences. For example, missing dental caries data generated from missing teeth are likely to be informative in that a missing tooth may be an indication of the severity of the decay for that particular tooth prior to the loss. For such data, ignoring missing data may lead to biased inferences. When a MNAR mechanism is suspected, a model that incorporates both the information from the outcome process and the missing data process into a unified estimating function was advocated (Diggle and Kenward, 1994 and Molenberghs et al. 1997). Such an approach has provoked a large debate about the role for such models in understanding the true data generating mechanism. The original enthusiasm was followed by skepticism about the strong and untestable assumptions on which this type of models rests (Verbeke et al., 2001). Specifically, joint models for the outcomes and missing data are typically not identifiable from observed data at hand. One then has to impose quantitative restrictions to recover identifiability. Conventional restrictions result from considering a minimal set of parameters, called sensitivity parameters, conditional upon which the remaining parameters are assumed identifiable. This method therefore produces a range of models which forms Agresti, A. (2002), *Categorical Data Analysis*, Second Edition, New York: John Wiley & Sons.


**Part 2** 

**The Diagnosis of Caries** 


**Part 2** 

**The Diagnosis of Caries** 

102 Contemporary Approach to Dental Caries

Robins, J., Rotnitzky, A. and Zhao, L.P. (1995). Analysis of semiparametric regression

Tellez, M., Sohn, W., Burt, B.A., & Ismail A. I. (2006). Assessment of the relationship

Todem, D, Zhang, Y., Ismail, A., and Sohn, W. (2010). Random effects regression models for

Vach, W. & Blettner, M. (1995). Logistic regression with incompletely observed categorical

Verbeke, C. and Molenberghs, G. (2000) *Linear mixed models for longitudinal data.* New York:

Verbeke, G., Molenberghs, G., Thijs, H., Lesaffre, E. & Kenward, M. (2001). Sensitivity analysis for nonrandom dropout: A local influence approach. *Biometrics* 57, 7–14. Zhang, Y., D. Todem, K. Kim and E. Lesaffre (2011). Bayesian Latent Variable Models for

*American Statistical Association*, 90,106

1661 - 1679

Springer-Verlag.

Modelling, 11(1):25-47

Publications (vol 2, pp: 762-764), Los Angeles

assumption. *Statistics in Medicine* 14, 1315–1329.

models for repeated outcomes under the presence of missing data. *Journal of the* 

between neighborhood characteristics and dental caries severity among lowincome African-Americans: a multilevel approach., *J Public Health Dent*.66:30-6. Todem, D. (2008). Oral Health. In Sarah Boslaugh (ed.) Encyclopedia of Epidemiology, Sage

count data with excess zeros in caries research, Journal of Applied Statistics, 37(10):

covariates – Investigating the sensitivity against violation of the missing at random

Spatially Correlated Tooth-level Binary Data in Caries Research, Statistical

**6** 

*1Cruzeiro do Sul University,* 

*3University of Bern,* 

*1,2Brazil 3Switzerland* 

*2Federal University of Rio Grande do Sul,* 

**Traditional and Novel Caries Detection Methods** 

Dental caries is a bacteria-associated progressive process of the hard tissues of the coronal and root surfaces of teeth. The net demineralization may begin soon after tooth eruption in caries susceptible children without being recognized by dental professionals. This process may progress further resulting in a caries lesion that is the sign and/or the symptom of the carious process. Caries is in other words a continuum which may by assessed falsely when only a certain time point is considered. Figure 1 shows different stages of the carious

Fig. 1. (A) Sound occlusal surface. (B-D) Caries process in different stages.

Furthermore the judgement of its activity is an integral part of diagnosis.

Caries diagnosis implies more than just detecting lesions. Consequently, caries diagnosis as an intellectual process - is the determination of the presence and extent of a caries lesion.

Since diagnosis is a mental resting place on the way to treatment decision, it is intimately linked with the treatment plan to be followed. Thus diagnosis must include an assessment

**1. Introduction** 

process.

Michele Baffi Diniz1, Jonas de Almeida Rodrigues2 and Adrian Lussi3
