**3.4 Sample size justification**

Completing this process implies that we have carefully considered the hypothesis test and analysis variables, ultimately arriving at a statistical model that will rigorously address the research question. Sample size assessments will differ according to the statistical approach proposed to test the hypothesis, and should incorporate previously established public health or clinical information.

If the statistical approach entails adjustment for confounding and other sources of bias, the sample size calculation is often straightforward. Suppose we plan to test the significance of the treatment effect, *γ*<sup>0</sup> , previously defined in Model (1), and we have already identified measured confounders (i.e., covariates) that should be included in the model, referred to as X1*,* ⋯ XK. Our null hypothesis corresponds to *γ* ¼ 0, while our alternative hypothesis corresponds to *γ* 6¼ 0. Testing this hypothesis corresponds to determining sample size/power for a multiple linear regression model [20].

We now reconsider the importance of sample size justification for analyses involving a large registry. Statistical significance depends on the sample size and is typically declared if the *P* value obtained from the test statistic falls below a predetermined threshold (e.g., 0.05). This type of significance may be reached in any study, provided that the sample size is large enough; therefore, in addition to this mathematical criterion, we recommend specifying conditions that must be met to achieve practical (public health or clinical) significance within the context of the research question. In biomedical studies, these criteria can often be defined by determining the minimal clinically important difference (MCID). This technique was originally proposed for clinical trials [21] but has spawned several other approaches [22] to determine the MCID. Once we incorporate the MCID into our null and alternative hypothesis statements, we can perform the sample size calculation that corresponds to our proposed inferential analysis.

### **3.5 Missing data mechanisms and missing data modeling**

Missing data can occur in the registry setting for a variety of reasons. Simply put, a missing data point is an observation that should have been recorded; however, for some reason, it was not recorded. It is our desire, as analysts, to understand the reason for this "missingness." In this section, we outline practical analytic approaches to identify potential sources attributable to missing data and methods to combat the resulting bias. We begin with a brief description of the three fundamental missing data mechanisms. For an elegant mathematical treatment of the distinctions among the mechanisms, we refer the reader to the original work by Rubin [23].

## *3.5.1 Missing completely at random (MCAR)*

If the registry data are MCAR, then the reason for missingness is not related to the data that we were able to observe or to the data that we were not able to observe. We now consider the CF example. MCAR could correspond to the following. The probability of a lung function observation (the outcome variable) being missing from the registry does not depend on any of the observed data (e.g., patient's age) or any of the unobserved data (e.g., having lower lung function does not alter the risk of the observation being missing). Our analysis results from this subset of data will be no different (aside from larger standard errors) than if we had been able to perform the analysis on the entire dataset.

We can further examine the MAR assumption by checking for variables that are often missing simultaneously or other potential patterns of missingness. Whenever possible, we recommend performing the analysis under the MAR assumption. The two most common approaches under this mechanism are direct modeling and multiple imputation. Direct modeling implies that we will consider all available data points in our parameter estimation. This method is sometimes referred to as "available case analysis" [24]. In other words, the analysis will not exclude the records of any individual subject who has at least one observed entry. There is a second approach, multiple imputation [25], which has gained favor among analysts with the expansion of computing resources. To perform this approach, several data points for each missing data point are generated, resulting in several distinct dataset. We employ our proposed statistical model separately on each dataset and obtain parameter estimates. The estimates are combined to produce an aggregate estimate. The aggregate estimate and standard error are used to make interpretations of results. This technique is available in many software packages (e.g., SAS

Unfortunately, there is no way to know whether the data are MAR or MNAR.

applied approach to investigating MNAR assumptions in the context of sensitivity analyses. Although their text focuses on analyses for data from clinical trials, their approach and accompanying SAS implementation may be adapted to registry data

To simplify interpretation and improve accuracy of the results, sources of potential confounding (measured or unmeasured) should be considered as much in advance as possible. Propensity score regression offers an effective method to further balance the treatment and non-treatment groups. Like multivariable regression, this approach accounts for treatment selection bias [28] only for measured confounders (e.g., measured comorbidities and severity of illness). The propensity score could utilize measured confounders to remove treatment-selection bias. However, when there are unmeasured confounders that determine treatmentselection bias, the propensity-score approach will be limited. In analyzing registry data, IV analyses should be considered when unmeasured confounders are

Although the IV analysis is a powerful approach, this method has some noteworthy constraints. Large sample size is essential for performing IV analysis, but this issue may not be a challenge in the registry setting. The IV must only affect treatment assignment and have no direct association with outcome. If these

assumptions are satisfied, then the IV analysis will yield a consistent estimate of the average causal effect [29]. Assumption (i) is directly testable, but making a heuristic argument for assumption (ii) is a common approach. See Kahn et al. [30] for an example. A weak IV will produce larger standard errors and may lead to incorrect inferential results. This approach is ideal in the presence of small/moderate confounding but becomes less reliable in the presence of large confounding.

Admittedly, this is a limitation of the IV analysis in the registry setting. On the other

hand, an appropriate IV minimizes the potential impacts of measured and

Previous work by experts in the analysis of missing data has shown that any model we develop under the MNAR assumption will have an equivalent MAR counterpart [26]. Developing an MNAR model requires technical steps that are beyond the scope of our current chapter. Dmitrienko et al. [27] provide an

proc mi, proc mianalyze).

**3.6 Interpretations of registry data analyses**

*Evaluating Clinical Effectiveness with CF Registries DOI: http://dx.doi.org/10.5772/intechopen.84269*

analyses.

suspected.

**63**

unmeasured confounding [31].

#### *3.5.2 Missing at random (MAR)*

This assumption is more relaxed than MCAR but still has specific requirements. For MAR to hold, the missingness cannot be related to unobserved data, given what we have been able to observe. In other words, the missingness can depend upon data that we have already observed (i.e., data entries that were recorded in the registry). Referring again to our CF example, the probability of a lung function observation being missing does not depend upon the actual lung function value, provided that we have the other covariate data. In this case, missingness can depend upon characteristics that have been recorded in the CFFPR (e.g., gender).

#### *3.5.3 Missing not at random (MNAR)*

We are more likely to encounter this mechanism in registry data, compared to the other mechanisms. If data are MNAR, then the missingness is related to unobserved data (unlike MAR). The missing observation follows a different distribution than the observed data, regardless of whether the two types of data have other characteristics that are the same. Despite the fact that we have registry data, the data that we are able to observe are not representative of the entire population. Within the CFFPR example, consider the longitudinal data. According to CF Foundation guidelines, patients are supposed to have at least one pulmonary function test per quarter [5]. Suppose there is a subset of patients who do not have lung function data recorded at every clinical encounter. There are many plausible explanations for why these data are missing. For an individual patient, there may be a lack of interest in managing his disease progression, or it could be an entry error. In general, we may lack relations to observed values or those relations may be irrelevant.

In practice, we do not have the information necessary to declare the reason for the missingness. Even thoughtfully developed, well-maintained registries will have missing data; therefore, sensitivity analyses are needed as part of the statistical considerations. As a preliminary step, we recommend creating an indicator (dummy) variable to indicate whether the observation is missing (=1) or otherwise (=0). Regress this dichotomous variable on the other variables to determine whether the missing indicator is associated with observed characteristics. If no association is found, we may conclude that the data are MCAR; however, we still encourage caution when making the MCAR assumption for statistical models using registry data. Although small sample size may produce this result, it is not a likely culprit in settings with large data sources. It is possible that the extent of the missingness may be too low (e.g., 5% of observations are missing) to substantially alter results, but having a low proportion of missing observations is also unlikely in a registry setting. If there is a significant association from our preliminary regression with the indicator variable, then we can rule out the MCAR assumption and more intently investigate the MAR and MNAR assumptions.

*Evaluating Clinical Effectiveness with CF Registries DOI: http://dx.doi.org/10.5772/intechopen.84269*

We can further examine the MAR assumption by checking for variables that are often missing simultaneously or other potential patterns of missingness. Whenever possible, we recommend performing the analysis under the MAR assumption. The two most common approaches under this mechanism are direct modeling and multiple imputation. Direct modeling implies that we will consider all available data points in our parameter estimation. This method is sometimes referred to as "available case analysis" [24]. In other words, the analysis will not exclude the records of any individual subject who has at least one observed entry. There is a second approach, multiple imputation [25], which has gained favor among analysts with the expansion of computing resources. To perform this approach, several data points for each missing data point are generated, resulting in several distinct dataset. We employ our proposed statistical model separately on each dataset and obtain parameter estimates. The estimates are combined to produce an aggregate estimate. The aggregate estimate and standard error are used to make interpretations of results. This technique is available in many software packages (e.g., SAS proc mi, proc mianalyze).

Unfortunately, there is no way to know whether the data are MAR or MNAR. Previous work by experts in the analysis of missing data has shown that any model we develop under the MNAR assumption will have an equivalent MAR counterpart [26]. Developing an MNAR model requires technical steps that are beyond the scope of our current chapter. Dmitrienko et al. [27] provide an applied approach to investigating MNAR assumptions in the context of sensitivity analyses. Although their text focuses on analyses for data from clinical trials, their approach and accompanying SAS implementation may be adapted to registry data analyses.

#### **3.6 Interpretations of registry data analyses**

To simplify interpretation and improve accuracy of the results, sources of potential confounding (measured or unmeasured) should be considered as much in advance as possible. Propensity score regression offers an effective method to further balance the treatment and non-treatment groups. Like multivariable regression, this approach accounts for treatment selection bias [28] only for measured confounders (e.g., measured comorbidities and severity of illness). The propensity score could utilize measured confounders to remove treatment-selection bias. However, when there are unmeasured confounders that determine treatmentselection bias, the propensity-score approach will be limited. In analyzing registry data, IV analyses should be considered when unmeasured confounders are suspected.

Although the IV analysis is a powerful approach, this method has some noteworthy constraints. Large sample size is essential for performing IV analysis, but this issue may not be a challenge in the registry setting. The IV must only affect treatment assignment and have no direct association with outcome. If these assumptions are satisfied, then the IV analysis will yield a consistent estimate of the average causal effect [29]. Assumption (i) is directly testable, but making a heuristic argument for assumption (ii) is a common approach. See Kahn et al. [30] for an example. A weak IV will produce larger standard errors and may lead to incorrect inferential results. This approach is ideal in the presence of small/moderate confounding but becomes less reliable in the presence of large confounding. Admittedly, this is a limitation of the IV analysis in the registry setting. On the other hand, an appropriate IV minimizes the potential impacts of measured and unmeasured confounding [31].

Sensitivity analyses should be performed to examine potential impacts of missing data and particular subgroups that may drive inferential results. Analyses corresponding to the missing at random assumption should be explored in the registry setting. Subgroup analyses are essential to identify heterogeneous treatment effects, particularly in the IV analysis. These sensitivity analyses should be performed regardless of the statistical model that we choose to employ.

## **4. Illustrative application**

#### **4.1 Data summary and descriptive analysis**

The CFFPR contains data on individuals receiving care from any CF center in the United States, which has been accredited by the CF Foundation. Like many registries, we underwent an application process to receive the data. The CFFPR data that we received were in separate databases. We used the following two databases. The encounter-level database had one record per patient, per clinical encounter. The annual-level database contained one record per patient, per year. We merged these data to extract the information necessary to determine whether there is a significant association between the use of inhaled tobramycin and lung function in individuals with CF who are chronically infected with *Pa*. Our primary outcome, lung function, was defined as mean change in FEV1% predicted (FEV1). In this application, we study short-term effectiveness of inhaled tobramycin, in order to facilitate use of instrumental variables, which still pose several challenges in longitudinal settings with multiple data points and time-varying exposures [17].

We considered the following restrictions to target the study cohort of interest. We requested CFFPR data ranging from January 1, 1998 to December 31, 2009, in order to capture the time at which inhaled tobramycin (Tobi) was recorded in the registry on a consistent basis. We did not consider study records with individuals <6 years of age, due to limitations of modality to measure lung function in young children. We limited the maximum age to 21 years, in an effort to focus on first occurrence of chronic *Pa*. We identified the first chronic Pa infection for each individual by examining all Pa culture results available in the encounter-level data. Patients recorded as having a positive *Pa* culture more than 50% of time in a given year were considered as eligible for the study. This was determined by using the *Pa* culture (indicator) variable available in the CFFPR. We took the first year that the patient had chronic Pa infection as the baseline year. In an effort to keep our study data to one record per patient, we only considered the first chronic *Pa* infection for each patient. Patients who also had another infection at the same time, *Burkholderia cepacia* complex, were not considered as part of the study cohort, because of previously established criteria [32]. An indicator variable for patient-level tobramycin use was defined as receiving inhaled tobramycin within 6 months of initial chronic *Pa*. Baseline FEV1 was defined as the closest FEV1 measurement recorded within 6 months after initial chronic *Pa* record. Follow-up FEV1 was defined as the closest recorded FEV1 within 1.5–2.5 years of the baseline FEV1. Patients who did not have a recorded FEV1 measurement within 6 months after meeting criteria for chronic *Pa* infection were excluded. The outcome variable, decline in FEV1, was calculated as the difference between follow-up and baseline FEV1 for each patient. A negative value implies that FEV1 declined over the 2-year period; a positive value indicates that FEV1 increased over the 2-year period. **Figure 2** illustrates steps to determining the analysis cohort and resulting sample size.

measurements for age, FEV1, weight-for-age percentile, insurance coverage, CFrelated diabetes (with or without fasting hyperglycemia), dornase alfa use, pancreatic insufficiency (defined as taking pancreatic enzymes) and number of hospitalizations in the preceding year. We can compare Tobi and non-Tobi groups with respect to each of these variables using basic inferential testing (i.e., nonparametric test for continuous variables and Chi-square test for categorical variables). Results of the descriptive analysis are presented in **Table 1**. Our descriptive analysis reveals that Tobi and non-Tobi groups differed by several demographic and clinical characteristics. We note that the groups did not differ according to age or being pancreatic insufficient. Next, we utilize the aforementioned statistical models to test

*Diagram of study population in the illustrative CF example, showing inclusion and exclusion steps to obtain an analysis cohort from the registry. CFFPR, Cystic Fibrosis Foundation Patient Registry;* Pa*,* Pseudomonas

We use Model (1) to test the association between lung function and tobramycin

use, adjusting for potential confounders as covariates, represented as X1*,* ⋯ XK. **Table 2** shows the results of the multiple linear regression, which suggest that the treated group experienced greater mean decline in FEV1% predicted than the

this association.

**65**

**Figure 2.**

aeruginosa*.*

**4.2 Multiple linear regression**

*Evaluating Clinical Effectiveness with CF Registries DOI: http://dx.doi.org/10.5772/intechopen.84269*

We identified potential confounders by looking at previous literature (see [6], for example). These variables, measured in the CFFPR, included gender, baseline

#### **Figure 2.**

*Diagram of study population in the illustrative CF example, showing inclusion and exclusion steps to obtain an analysis cohort from the registry. CFFPR, Cystic Fibrosis Foundation Patient Registry;* Pa*,* Pseudomonas aeruginosa*.*

measurements for age, FEV1, weight-for-age percentile, insurance coverage, CFrelated diabetes (with or without fasting hyperglycemia), dornase alfa use, pancreatic insufficiency (defined as taking pancreatic enzymes) and number of hospitalizations in the preceding year. We can compare Tobi and non-Tobi groups with respect to each of these variables using basic inferential testing (i.e., nonparametric test for continuous variables and Chi-square test for categorical variables). Results of the descriptive analysis are presented in **Table 1**. Our descriptive analysis reveals that Tobi and non-Tobi groups differed by several demographic and clinical characteristics. We note that the groups did not differ according to age or being pancreatic insufficient. Next, we utilize the aforementioned statistical models to test this association.

#### **4.2 Multiple linear regression**

We use Model (1) to test the association between lung function and tobramycin use, adjusting for potential confounders as covariates, represented as X1*,* ⋯ XK. **Table 2** shows the results of the multiple linear regression, which suggest that the treated group experienced greater mean decline in FEV1% predicted than the


**4.3 Propensity score method**

*Evaluating Clinical Effectiveness with CF Registries DOI: http://dx.doi.org/10.5772/intechopen.84269*

**Figure 4.**

**67**

**Figure 3.**

*by the quintiles.*

The patient characteristics at the baseline, which are known to impact FEV outcomes, are considered into the multivariable logistic regression model (Eq. (2)) for estimating propensity scores. **Figure 3** presented the histograms of propensity score for the Tobi treated and not-treated patient groups, showing different but overlapping propensity scores between the two groups. Propensity scores are grouped into five groups by quintiles. The distribution of propensity scores are compared between the Tobi treated and not treated patients within each of the five PS categories; as one could see from **Figure 4**, within each quintile categories, the two patient groups present comparable patterns in their likelihood of receiving

*Box-Whisker plots of the distribution of propensity scores by Tobi use (red) and not use (blue) groups stratified*

*Histogram of the propensity score distributions by Tobi use (red) and not-group groups (blue). Related the*

*measured confounders; therefore no arrow is drawn for this relationship.*

*Abbreviations: CF, cystic fibrosis; FEV1, percentage predicted of forced expiratory volume in 1 s. <sup>a</sup> For each categorical variable in the first-stage model, the coefficient is the difference in patient tobramycin use between the indicated category and the reference category (labeled as coefficient = 0). For each continuous variable, it is the change in patient tobramycin use when the variable is increased by 1 unit. A negative value implies decreased patient tobramycin use.*

*b Predicted treatment obtained in Stage 1 serves as propensity score in Stage 2. For each categorical variable in the second-stage model, the coefficient is the difference in FEV1 decline between the indicated group and the reference group (labeled as coefficient = 0).*

*+ significant at 2-sided p value < 0.05*

*For each continuous variable, it is the change in FEV1 when the variable is increased by 1 unit. A negative value implies greater FEV1 decline.*

#### **Table 2.**

*Multiple linear regression and propensity score method to predict lung function decline.*

untreated group. Although most covariates were statistically significant at *P* < 0.05, we found that CF-related diabetes, pancreatic insufficiency, and dornase alfa use were not significant predictors of outcome.
