**3. Statistical considerations for comparative effectiveness using registry studies**

Statistical analyses in the registry data setting are subject to the statistical challenges previously described for analyses of observational studies [10]. Registries are often established for the purpose of evaluating the effects of interventions. The statistical analysis plan should include appropriate methods to test each hypothesis, methods to address biases and confounding arising from various sources, and sample size/power considerations.

#### **3.1 Selection bias**

Regardless of the research question, a registry study will likely be plagued with numerous sources of bias. Selection bias, although inevitable, is typically the most concerning. This type of bias distorts the results for the association of interest and may yield misleading results. Failure to sample from the correct target population and loss to follow-up due to death or some other event are types of selection bias.

A pervasive type of selection bias is confounding by indication, arising from nonrandomized treatment assignment that is often related to the patient's risk to experience poor outcomes. This treatment-by-selection bias creates distinctions between the risk profiles of treated and comparator groups and may violate statistical assumptions in our analyses. In our CF example, treatment selection bias may be more pronounced because the drug in question should only be prescribed to individuals with CF who have a specific chronic infection. Narrowing the cohort to "sicker" individuals can intensify the aforementioned risk profile imbalance between Tobi and non-Tobi groups.

Statistical methods to combat treatment selection bias have been applied in previous studies. Approaches to adjust for treatment selection bias include multivariable regression, propensity score methods, matching and instrumental variables analysis. Stukel et al. [11] applied each of these four approaches to examine the association between cardiac catheterization and long-term acute myocardial infarction mortality. The authors found that the results differed according to the choice of statistical approach. Next, we describe and outline each approach in the context of our CF example.

### **3.2 Statistical analyses of comparative effectiveness utilized for registry data analysis**

#### *3.2.1 Multivariable regression*

In the absence of randomization, intervention and comparator groups may exhibit large differences with respect to observed covariates recorded in the

registry. This approach, sometimes referred to as covariate adjustment, attempts to account for such differences that may distort estimates of intervention effects (**Figure 1**). Most biomedical studies employ ordinal least squares (OLS) regression to adjust the association between the treatment indicator variable ð Þ *Ti* and outcome variable ð Þ *Y* for measured confounders ð Þ *X*1*;* … *XK* . The OLS regression model for each subject ð Þ *i* ¼ 1*;* … *l* specifies

$$y\_i = \beta\_0 + \beta\_1 X\_1 + \dots + \beta\_K X\_K + \gamma T\_i + u\_i \tag{1}$$

where *logit*�<sup>1</sup>

mean difference (SMD).

**59**

*3.2.3 Instrumental variables (IV) analysis*

which was introduced in Model (1).

variables *Yi; Ti* ð Þ *;Xi* correspond to data from the *i*

*<sup>π</sup>*^*<sup>i</sup>* <sup>¼</sup> *logit*�<sup>1</sup>

ð Þ¼ *<sup>p</sup>* log *<sup>p</sup>*

*Evaluating Clinical Effectiveness with CF Registries DOI: http://dx.doi.org/10.5772/intechopen.84269*

<sup>1</sup>�*<sup>p</sup>*, and the propensity score is estimated by

*yi* ¼ *β*<sup>0</sup> þ *β*1*π*<sup>1</sup> þ *γT* þ *ui* (3)

*fti* <sup>þ</sup> <sup>1</sup>�*Ti* 1�*fti* .

*th* patient in the registry, and we

ð Þ *β*<sup>0</sup> þ *β*1*X*<sup>1</sup> þ ⋯ þ *βKXK* . There are several propensity score approaches:

propensity score adjustment, stratified analyses by the quintiles of propensity score, propensity score sub classification matches treated and untreated patients on their propensity score sub-classes (often by percentiles), and inverse weighting of propensity scores. The first approach includes propensity score directly in the

The second and the third approaches often categorize patients into five groups using propensity score quintiles. The stratified analyses will perform the regression model of *yi* ¼ *β*0*,k* þ *γkTi* for *k* ¼ 1*,* … *,* 5, and estimate the treatment effect by *<sup>γ</sup>* <sup>¼</sup> <sup>∑</sup>*<sup>k</sup>*¼1…5*γk=*5. The PS sub-classification matched analyses will be matching the Tobi and non-Tobi patients on their propensity score groups, then perform analyses for matched pairs. The propensity matching could also be performed on a finer grouping, for example, using 10 groups, or fine matching where a Tobi patient finding matching non-Tobi patient(s) though a distance measure. The method of inverse PS weighting assigns higher (lower) weight to patients who has lower (higher) propensity of receiving Tobi, where the weight is defined as *wi* <sup>¼</sup> *Ti*

The intuition behind the weighting approach came from the survey sampling method, and through inverse weighting, one could align the Tobi and non-Tobi patients to have comparable distribution of the confounders. There are advantages and disadvantage of each propensity score methods. Comparisons of these methods can be found in an excellent review paper by Austin and Mandani [12] and the references therein. Different methods are available for deriving propensity score. Other than the logistic regression, one could use more flexible classification and regression tree [13], boosted logistic regression [14], and covariate balancing propensity score method [15]. When applying PS approaches, it is important to check PS balance between the two treatment groups. Patients who have extremely high or low PS values that are not compatible with values from any patients in the other treatment group should be excluded from the PS analyses. The balancing check can be presented in graphic presentation, usually presenting the absolute standardized

One of our primary analysis goals in the registry setting is to identify potential sources of confounding and make the appropriate adjustments in our statistical analysis. Failure to identify sources of measured confounding results in residual confounding. This type of unaddressed confounding goes into the error term, *u*,

Inferential results can also be impacted by what is known as unmeasured confounding. McClellan et al. [16] propose a technique known as instrumental variables (IV) to combat both measured and unmeasured confounding. We introduce the following notation for IV regression. From Model (1), recall that the

assume that there is no correlation between the treatment variable, *Ti*, and the error term, *ui*. This correlation is present when patients receive treatment based on unmeasured characteristics. Let *Ri* represent an instrument. Consider the following example of a randomized controlled trial. If *Ri* represents random assignment to

regression equation as a covariate to obtain adjusted treatment effect,

where *β*<sup>0</sup> is the parameter for the model intercept and *μ<sup>i</sup>* is an error term. Each of the model parameters *β*1*,* … *β<sup>K</sup>* correspond to the association between the measured confounder and outcome variable. The parameter for treatment effect is *γ*; we denote its OLS estimate as ^*γ*. OLS estimation requires that the error term ð Þ *u* is not correlated with the measured confounders ð Þ *X*1*;* … *XK* or the treatment ð Þ *T* . Therefore, the only effect of *T* on outcome variable ð Þ *Y* is the direct effect estimated as ^*γ*. The challenge of utilizing multivariable regression model for comparative effectiveness is that we must appropriate account for necessary set of confounders. Failure to fully account for necessary confounder may lead to bias estimate of treatment effect.

#### *3.2.2 Propensity score regression*

The propensity score (PS) is a summary balancing score indicating the likelihood for a patient to receive the active treatment ð Þ *Ti* ¼ 1 using observed set of confounders ð Þ *X*1*;* … *XK* , represented in **Figure 1**. It is a balance score, because by conditioning on the propensity score, one could achieve independence between the treatment assignment and confounders; therefore, propensity scores help to achieve quasi-experiment design for natural occurring treatment assignment in a registry study. The PS can be estimated through a logistic regression modeling

$$\text{logit}P(T\_i = \mathbf{1}) = \beta\_0 + \beta\_1 \mathbf{X}\_1 + \dots + \beta\_K \mathbf{X}\_K \tag{2}$$

#### **Figure 1.**

*Causal diagram. The multivariable regression in Model (1) examines the treatment-outcome association, after adjustment for measured confounders. The propensity score methods outlined in Model (2) use the measured confounders to balance the treatment groups (exposure). The IV regression from Model (3) examines the treatment-outcome association, to the extent that the exposure is associated with the instrument. The instrument should not be related the measured confounders; therefore no arrow is drawn for this relationship.*

where *logit*�<sup>1</sup> ð Þ¼ *<sup>p</sup>* log *<sup>p</sup>* <sup>1</sup>�*<sup>p</sup>*, and the propensity score is estimated by *<sup>π</sup>*^*<sup>i</sup>* <sup>¼</sup> *logit*�<sup>1</sup> ð Þ *β*<sup>0</sup> þ *β*1*X*<sup>1</sup> þ ⋯ þ *βKXK* . There are several propensity score approaches: propensity score adjustment, stratified analyses by the quintiles of propensity score, propensity score sub classification matches treated and untreated patients on their propensity score sub-classes (often by percentiles), and inverse weighting of propensity scores. The first approach includes propensity score directly in the regression equation as a covariate to obtain adjusted treatment effect,

$$
\mu\_i = \beta\_0 + \beta\_1 \pi\_1 + \gamma T + u\_i \tag{3}
$$

The second and the third approaches often categorize patients into five groups using propensity score quintiles. The stratified analyses will perform the regression model of *yi* ¼ *β*0*,k* þ *γkTi* for *k* ¼ 1*,* … *,* 5, and estimate the treatment effect by *<sup>γ</sup>* <sup>¼</sup> <sup>∑</sup>*<sup>k</sup>*¼1…5*γk=*5. The PS sub-classification matched analyses will be matching the Tobi and non-Tobi patients on their propensity score groups, then perform analyses for matched pairs. The propensity matching could also be performed on a finer grouping, for example, using 10 groups, or fine matching where a Tobi patient finding matching non-Tobi patient(s) though a distance measure. The method of inverse PS weighting assigns higher (lower) weight to patients who has lower (higher) propensity of receiving Tobi, where the weight is defined as *wi* <sup>¼</sup> *Ti fti* <sup>þ</sup> <sup>1</sup>�*Ti* 1�*fti* . The intuition behind the weighting approach came from the survey sampling method, and through inverse weighting, one could align the Tobi and non-Tobi patients to have comparable distribution of the confounders. There are advantages and disadvantage of each propensity score methods. Comparisons of these methods can be found in an excellent review paper by Austin and Mandani [12] and the references therein. Different methods are available for deriving propensity score. Other than the logistic regression, one could use more flexible classification and regression tree [13], boosted logistic regression [14], and covariate balancing propensity score method [15]. When applying PS approaches, it is important to check PS balance between the two treatment groups. Patients who have extremely high or low PS values that are not compatible with values from any patients in the other treatment group should be excluded from the PS analyses. The balancing check can be presented in graphic presentation, usually presenting the absolute standardized mean difference (SMD).

#### *3.2.3 Instrumental variables (IV) analysis*

One of our primary analysis goals in the registry setting is to identify potential sources of confounding and make the appropriate adjustments in our statistical analysis. Failure to identify sources of measured confounding results in residual confounding. This type of unaddressed confounding goes into the error term, *u*, which was introduced in Model (1).

Inferential results can also be impacted by what is known as unmeasured confounding. McClellan et al. [16] propose a technique known as instrumental variables (IV) to combat both measured and unmeasured confounding. We introduce the following notation for IV regression. From Model (1), recall that the variables *Yi; Ti* ð Þ *;Xi* correspond to data from the *i th* patient in the registry, and we assume that there is no correlation between the treatment variable, *Ti*, and the error term, *ui*. This correlation is present when patients receive treatment based on unmeasured characteristics. Let *Ri* represent an instrument. Consider the following example of a randomized controlled trial. If *Ri* represents random assignment to

treatment, it is the ideal instrument. By construction, it is related to outcome only through treatment assignment [17].

In the typical clinical setting, a provider does not flip a coin to determine whether she will prescribe her patient treatment A, as opposed to some alternative. By construction, *real-world* data contained in registries represent non-random assignment to treatment. Instead, we identify a variable—"an instrument"—that is related to the outcome only through treatment. The variable *Ri* is a valid instrument, provided the following assumptions are met:


Fortunately, assumption (i) is testable by performing least-squares regression of the proposed IV on the treatment variable and measured confounders:

$$T\_i = b\_0 + \lambda R\_i + b\_1 X\_1 + \dots + b\_K X\_K + \delta\_{i\nu} \tag{4}$$

registry data. Relatively few statistical approaches are available to assess timevarying treatment effects or intermediate outcomes. Hogan and Lancaster [18] proposed inverse probability weighting and instrumental variables as time-varying treatment approaches; another population-based approach is the G-computation

Completing this process implies that we have carefully considered the hypothesis test and analysis variables, ultimately arriving at a statistical model that will rigorously address the research question. Sample size

assessments will differ according to the statistical approach proposed to test the hypothesis, and should incorporate previously established public health or clinical

have already identified measured confounders (i.e., covariates) that should be included in the model, referred to as X1*,* ⋯ XK. Our null hypothesis corresponds to *γ* ¼ 0, while our alternative hypothesis corresponds to *γ* 6¼ 0. Testing this hypothesis corresponds to determining sample size/power for a multiple linear regression

We now reconsider the importance of sample size justification for analyses involving a large registry. Statistical significance depends on the sample size and is typically declared if the *P* value obtained from the test statistic falls below a predetermined threshold (e.g., 0.05). This type of significance may be reached in any study, provided that the sample size is large enough; therefore, in addition to this mathematical criterion, we recommend specifying conditions that must be met to achieve practical (public health or clinical) significance within the context of the research question. In biomedical studies, these criteria can often be defined by determining the minimal clinically important difference (MCID). This technique was originally proposed for clinical trials [21] but has spawned several other approaches [22] to determine the MCID. Once we incorporate the MCID into our null and alternative hypothesis statements, we can perform the sample size

Missing data can occur in the registry setting for a variety of reasons. Simply put, a missing data point is an observation that should have been recorded; however, for some reason, it was not recorded. It is our desire, as analysts, to understand the reason for this "missingness." In this section, we outline practical analytic

approaches to identify potential sources attributable to missing data and methods to combat the resulting bias. We begin with a brief description of the three fundamental missing data mechanisms. For an elegant mathematical treatment of the distinctions among the mechanisms, we refer the reader to the original work by

If the registry data are MCAR, then the reason for missingness is not related to the data that we were able to observe or to the data that we were not able to observe.

calculation that corresponds to our proposed inferential analysis.

**3.5 Missing data mechanisms and missing data modeling**

*3.5.1 Missing completely at random (MCAR)*

If the statistical approach entails adjustment for confounding and other sources of bias, the sample size calculation is often straightforward. Suppose we plan to test

, previously defined in Model (1), and we

formula [19].

information.

model [20].

Rubin [23].

**61**

**3.4 Sample size justification**

the significance of the treatment effect, *γ*<sup>0</sup>

*Evaluating Clinical Effectiveness with CF Registries DOI: http://dx.doi.org/10.5772/intechopen.84269*

where *b*<sup>0</sup> is the intercept; *b*1*,* … *bK* are the parameters corresponding to the aforementioned measured confounders, *X*1*,* … *XK*; *λ* is the parameter estimate of the association between the treatment variable, *Ti*, and the IV, *Ri*. The magnitude of this association is a measure of the strength of the instrument [17]. Higher magnitude corresponds to greater strength. Let *Ti* be the resulting prediction of the treatment value, obtained from Model (4). This association is illustrated in **Figure 1** by the arrow moving downward from the instrument to exposure.

We continue this approach, often referred to as two-stage least squares regression, by substituting *Ti* from Model (4) into the multiple linear regression defined in Model (1):

$$\mathbf{y}\_{i} = \boldsymbol{\beta}\_{0}^{\prime} + \boldsymbol{\beta}\_{1}^{\prime}\mathbf{X}\_{1} + \cdots + \boldsymbol{\beta}\_{K}^{\prime}\mathbf{X}\_{K} + \boldsymbol{\gamma}^{\prime}\hat{\boldsymbol{T}}\_{i} + \boldsymbol{\eta}\_{i} \tag{5}$$

In this regression, the same method of estimation is used; however, we use distinct notation because parameter estimates and residual error will differ from Model (1). Finally, we use the estimate of *γ*<sup>0</sup> from Model (5) for our interpretations of treatment effect on the outcome. This estimate corresponds to the association in **Figure 1** from treatment to outcome. Note that it is the same path as the multiple linear regression, but the treatment effect has been "instrumented." Assumption (ii) cannot be formally tested, but can be explained in the context of the registry analysis at-hand. We provide this type of explanation in our illustrative application. Sensitivity analyses are imperative to determine the robustness of the IV. We recommend analyzing the data in subgroups to understand how these groups may drive heterogeneous treatment effects.

#### **3.3 Time varying treatment/exposure and covariate**

Incorporating time-varying treatment and/or covariate effects is a pervasive issue in registry data analyses. The fundamental challenge arising from the change in treatment and covariates over time often results from a patient's responses and/or experiences with the previous treatment assignment. Thus, simply including the time varying treatment or covariate in such cases could induce bias in estimating treatment effect. Special attention is needed to address this issue when analyzing

registry data. Relatively few statistical approaches are available to assess timevarying treatment effects or intermediate outcomes. Hogan and Lancaster [18] proposed inverse probability weighting and instrumental variables as time-varying treatment approaches; another population-based approach is the G-computation formula [19].
