**7. Conclusion**

This chapter has presented bounds on ACE in randomized trials with noncompliance. Although the results presented here are relevant to the causal differences, they can also be readily applied to the causal risk ratio when the outcome is binary.

<sup>2</sup> If (*s*, *t*) = (2, 1) in the MTR and RMTR (Assumptions 2.1 and 2.2) is used, the lower bound is 0 under the MTR and the upper bound is 0 under the RMTR.

<sup>3</sup> If (*s*, *t*) = (2, 1) in the MTR and RMTR (Assumptions 2.1 and 2.2) is additionally used, a candidate of the lower bound is 0 under the MTR and that of the upper bound is 0 under the RMTR.

Causal Inference in Randomized Trials with Noncompliance 331

To derive equation (3.3), we consider the difference between E(*Y*|*R* = 2) and E(*YX*=2) and

)(E)2|(E ( )

By a similar calculation, the latter difference becomes E(*Y*|*R* = 1) – E(*YX*=1) = {*β*1 – (*E*21 –

The derivation of equation (3.5) is as follows. Simple algebra, *p*2|*<sup>r</sup>* × equation (3.1) plus *p*1|*<sup>r</sup>* × equation (3.2), yields *p*2|*<sup>r</sup>*ACE + E(*YX*=1) = E(*Y*|*R* = *r*) – (*αr* – *βr*)*p*1|*rp*2|*<sup>r</sup>*. The difference

(*p*2|2 – *p*2|1)ACE = E(*Y*|*R* = 2) – E(*Y*|*R* = 1) – (*α*2 – *β*2)*p*1|2*p*2|2 + (*α*1 – *β*1)*p*1|1*p*2|1.

Inequality (4.1) can be derived as presented below. By substituting *K*<sup>0</sup> ≤ E(*YX*=*<sup>x</sup>*|*X* = *x\**, *R* = *r*) ≤ *K*1 for *x* ≠ *x\** and E(*YX*=*<sup>x</sup>*|*X* = *x\**, *R* = *r*) = E(*Y*|*X* = *x*, *R* = *r*) (= *Exr*) for *x* = *x\** (consistency

,)|\*Pr()\*,|(E)|(E

for *x* ≠ *x\**. Because E(*YX*=*<sup>x</sup>*) = E(*YX*=*<sup>x</sup>*|*R* = *r*) by Assumption 1, the bounds of E(*YX*=*<sup>x</sup>*) become:

for *x* ≠ *x\**. The difference between the lower and upper bounds of this inequality for *x* = 1, 2

Inequality (4.3) can be also derived using equation (8.1). Assumption 2.1 implies that E(*YX*=2|*X* = *x*, *R* = *r*) ≥ E(*YX*=1|*X* = *x*, *R* = *r*). Thus, by substituting this inequality with *x* = 1

)|Pr(),|(E

and thus E(*YX*=2) ≥ max{E(*Y*|*R* = 1), E(*Y*|*R* = 2)}. Similarly, E(*YX*=1) ≤ min{E(*Y*|*R* = 1), E(*Y*|*R* = 2)} by substituting E(*YX*=2|*X* = 2, *R* = *r*) ≥ E(*YX*=1|*X* = 2, *R* = *r*) into equation (8.1). The

*rRxXrRxXY*

)|Pr(),|(E

*rRxXrRxXY*

*xX*

 

1|\*01|1 max min)(E

),|(E

*rRY*

*xX*

*YRY pEpE x X xx*

2,1 2 2|2

.)}({ )1()(

2|122222 222|1212 2|2

*xX xX rRxXrRxXYrRY* (8.1)

*Exrpx*|*<sup>r</sup>* + *K*0*px\**|*<sup>r</sup>* ≤ E(*YX*=*<sup>x</sup>* | *R* = *r*) ≤ *Exrpx*|*<sup>r</sup>* + *K*1*px\**|*<sup>r</sup>* (8.2)

 

*pKpE <sup>Y</sup>*

 

(8.3)

 

*xx x xx x*

*pKpE*

2|\*12|2 1|\*11|1

*pEE pEpE*

2|1222

between E(*Y*|*R* = 1) and E(*YX*=1). The former difference derives:

*E*11)}*p*2|1. The difference between these equations derives equation (3.3).

2,1\* 

*x*

2|\*02|2

)|(E)(E

*rRYY*

2 2

*X X*

2,1

*x*

*x*

2,1

*xxx xx x*

*pKpE pKpE*

between this equation with *r* = 2 and that with *r* = 1 is:

This equation implies equation (3.5) for *p*2|2 ≠ *p*2|1.

 

difference between them is inequality (4.3).

**8.2 Derivations of inequalities in Section 4** 

assumption) into:

is inequality (4.1).

into equation (8.1), we obtain:

we obtain:

It is generally thought that the ITT analysis is likely to yield a downwardly biased estimate of causal effects (Sheiner & Rubin, 1995), whereas the PP analysis is likely to yield an upwardly biased estimate (Lewis & Machine, 1993). Thus, the ACE probably exists between the results of the ITT and PP analyses. As shown in Section 4.1, this is true under IV + MTR + MTS or under IV + RMTR + RMTS for noncompliance by switching the treatment. However, as shown in Sections 5 and 6, we cannot be certain that this is true when noncompliance is due to receiving no treatment and/or the IV assumption does not hold. Thus, investigators should not simply conclude that the ACE exists between the results of the ITT and PP analyses. Unfortunately, no standard method currently exists for estimating the ACE in randomized trials with noncompliance issues. Investigators should consider whether the assumptions presented in this chapter are valid and then yield bounds on ACE using the methodology described herein.

The needs from further methodologies in this field are three-fold. The first is to find weaker assumptions than those given here, which nevertheless can derive the same bounds. The second is to make assumptions that can derive the bounds with a narrower width, which are still reasonable in some situations. The ideal is to make a reasonable assumption that can give a point estimator. The third and final need is to extend the discussions in this chapter to more complex situations: for example, two types of noncompliance in this chapter may occur simultaneously, and more than two arms may be compared (Cheng & Small, 2006).

The other recent interest in causal inference is statistical analysis concerning the role of an intermediate variable between a particular treatment and outcome (Rubin, 2004; Joffe et al., 2007; VanderWeele, 2008b). Investigators are often interested in understanding how the effect of a treatment on an outcome may be mediated through an intermediate variable. For example, in the MRFIT, this implies that investigators are interested in how the effect of a multifactor intervention program on CHD mortality may be mediated through the smoking status 1 year after entry, rather than the effect of the smoking status 1 year after entry on CHD mortality. Such statistical analyses are closely related to issues of inference with a surrogate marker and issues of post-randomization selection bias and truncation-by-death (Zhang & Rubin, 2003; Chiba & VanderWeele, 2011). Further methodological research is needed to answer these issues.

#### **8. Appendix: Derivations of equations and inequalities**

This section outlines the derivations of the equations and inequalities presented in Sections 3, 4 and 6, which are outlined in Sections 8.1, 8.2 and 8.3, respectively.

#### **8.1 Derivations of equations in Section 3**

Equation (3.1) can be derived as follows:

$$\begin{aligned} \operatorname{E}(Y\_{X\simeq 2}) &= \operatorname{E}(Y\_{X\simeq 2} \mid R=r) \\ &= \sum\_{\mathbf{x}=\mathbf{l},2} \operatorname{E}(Y\_{X\simeq 2} \mid X=\mathbf{x}, R=r) \operatorname{Pr}(X=\mathbf{x} \mid R=r) \\ &= \left(E\_{2r} - \alpha\_r\right) p\_{\parallel r} + E\_{2r} p\_{\geq r} \\ &= E\_{2r} - \alpha\_r p\_{\parallel r} .\end{aligned}$$

The first equation holds by Assumption 1, and the third equation is derived by substituting E(*YX*=2|*X* = 1, *R* = *r*) = E(*YX*=2|*X* = 2, *R* = *r*) – *αr* and applying the consistency assumption: E(*YX*=2|*X* = 2, *R* = *r*) = E(*Y*|*X* = 2, *R* = *r*) (= *E*2*<sup>r</sup>*). A similar calculation derives equation (3.2).

To derive equation (3.3), we consider the difference between E(*Y*|*R* = 2) and E(*YX*=2) and between E(*Y*|*R* = 1) and E(*YX*=1). The former difference derives:

$$\begin{split} \mathbb{E}(Y \mid R=2) - \mathbb{E}(Y\_{\times -2}) &= \sum\_{\boldsymbol{x}=\mathbf{l},2} E\_{\boldsymbol{x}2} p\_{\boldsymbol{x}|2} - (E\_{22} - \alpha\_2 p\_{\|\boldsymbol{2}\|}) \\ &= (E\_{12} + \alpha\_2) p\_{\|\boldsymbol{2}} - E\_{22} (1 - p\_{\|\boldsymbol{2}\|}) \\ &= \{\alpha\_2 - (E\_{22} - E\_{22})\} p\_{\|\boldsymbol{2}}. \end{split}$$

By a similar calculation, the latter difference becomes E(*Y*|*R* = 1) – E(*YX*=1) = {*β*1 – (*E*21 – *E*11)}*p*2|1. The difference between these equations derives equation (3.3).

The derivation of equation (3.5) is as follows. Simple algebra, *p*2|*<sup>r</sup>* × equation (3.1) plus *p*1|*<sup>r</sup>* × equation (3.2), yields *p*2|*<sup>r</sup>*ACE + E(*YX*=1) = E(*Y*|*R* = *r*) – (*αr* – *βr*)*p*1|*rp*2|*<sup>r</sup>*. The difference between this equation with *r* = 2 and that with *r* = 1 is:

$$(p\_{2\mid 2} - p\_{2\mid 1})\text{ACE} = \text{E}(Y \mid R=2) - \text{E}(Y \mid R=1) - (a\_2 - \beta\_2)p\_{1\mid 2}p\_{2\mid 2} + (a\_1 - \beta\_1)p\_{1\mid 1}p\_{2\mid 1}.$$

This equation implies equation (3.5) for *p*2|2 ≠ *p*2|1.

#### **8.2 Derivations of inequalities in Section 4**

Inequality (4.1) can be derived as presented below. By substituting *K*<sup>0</sup> ≤ E(*YX*=*<sup>x</sup>*|*X* = *x\**, *R* = *r*) ≤ *K*1 for *x* ≠ *x\** and E(*YX*=*<sup>x</sup>*|*X* = *x\**, *R* = *r*) = E(*Y*|*X* = *x*, *R* = *r*) (= *Exr*) for *x* = *x\** (consistency assumption) into:

$$\operatorname{E}(Y\_{X=x} \mid R=r) = \sum\_{\mathbf{x}^{\bullet}=\mathbf{l},2} \operatorname{E}(Y\_{X=x} \mid X=\mathbf{x}^{\bullet}, R=r) \operatorname{Pr}(X=\mathbf{x}^{\bullet} \mid R=r),\tag{8.1}$$

we obtain:

330 Health Management – Different Approaches and Solutions

It is generally thought that the ITT analysis is likely to yield a downwardly biased estimate of causal effects (Sheiner & Rubin, 1995), whereas the PP analysis is likely to yield an upwardly biased estimate (Lewis & Machine, 1993). Thus, the ACE probably exists between the results of the ITT and PP analyses. As shown in Section 4.1, this is true under IV + MTR + MTS or under IV + RMTR + RMTS for noncompliance by switching the treatment. However, as shown in Sections 5 and 6, we cannot be certain that this is true when noncompliance is due to receiving no treatment and/or the IV assumption does not hold. Thus, investigators should not simply conclude that the ACE exists between the results of the ITT and PP analyses. Unfortunately, no standard method currently exists for estimating the ACE in randomized trials with noncompliance issues. Investigators should consider whether the assumptions presented in this chapter are valid and then yield bounds on ACE

The needs from further methodologies in this field are three-fold. The first is to find weaker assumptions than those given here, which nevertheless can derive the same bounds. The second is to make assumptions that can derive the bounds with a narrower width, which are still reasonable in some situations. The ideal is to make a reasonable assumption that can give a point estimator. The third and final need is to extend the discussions in this chapter to more complex situations: for example, two types of noncompliance in this chapter may occur simultaneously, and more than two arms may be compared (Cheng & Small, 2006). The other recent interest in causal inference is statistical analysis concerning the role of an intermediate variable between a particular treatment and outcome (Rubin, 2004; Joffe et al., 2007; VanderWeele, 2008b). Investigators are often interested in understanding how the effect of a treatment on an outcome may be mediated through an intermediate variable. For example, in the MRFIT, this implies that investigators are interested in how the effect of a multifactor intervention program on CHD mortality may be mediated through the smoking status 1 year after entry, rather than the effect of the smoking status 1 year after entry on CHD mortality. Such statistical analyses are closely related to issues of inference with a surrogate marker and issues of post-randomization selection bias and truncation-by-death (Zhang & Rubin, 2003; Chiba & VanderWeele, 2011). Further methodological research is

This section outlines the derivations of the equations and inequalities presented in Sections

)|Pr(),|(E

*rRxXrRxXY*

.

2 |22|1

*pEpE*

*rrrrr*

The first equation holds by Assumption 1, and the third equation is derived by substituting E(*YX*=2|*X* = 1, *R* = *r*) = E(*YX*=2|*X* = 2, *R* = *r*) – *αr* and applying the consistency assumption: E(*YX*=2|*X* = 2, *R* = *r*) = E(*Y*|*X* = 2, *R* = *r*) (= *E*2*<sup>r</sup>*). A similar calculation derives equation (3.2).

2

*X*

)(

*pE*

2 |1

*rrr*

using the methodology described herein.

needed to answer these issues.

**8.1 Derivations of equations in Section 3**  Equation (3.1) can be derived as follows:

**8. Appendix: Derivations of equations and inequalities** 

3, 4 and 6, which are outlined in Sections 8.1, 8.2 and 8.3, respectively.

)|(E)(E

*YY R r*

2 2

*X X*

*x*

2,1

$$E\_{\rm xr}p\_{\rm x}|\_{r} + K\_{\rm 0}p\_{\rm x^\*}|\_{r} \leq \mathcal{E}(Y\_{\rm X^\*\mathbb{Z}} \mid R=r) \leq E\_{\rm xr}p\_{\rm x^\parallel r} + K\_{\rm 1}p\_{\rm x^\*}|\_{r} \tag{8.2}$$

for *x* ≠ *x\**. Because E(*YX*=*<sup>x</sup>*) = E(*YX*=*<sup>x</sup>*|*R* = *r*) by Assumption 1, the bounds of E(*YX*=*<sup>x</sup>*) become:

$$\max \begin{bmatrix} E\_{\ge 1} p\_{\ge \mid \} + K\_0 p\_{\ge \mid \atop 2} \\ E\_{\ge 2} p\_{\ge \mid 2} K\_0 p\_{\ge \mid \end{bmatrix} \le \mathcal{E}(Y\_{\ge \ge}) \le \min \begin{bmatrix} E\_{\ge 1} p\_{\ge \mid} + K\_1 p\_{\ge \mid} \\ E\_{\ge 2} p\_{\ge \mid \end{bmatrix} + K\_1 p\_{\ge \mid \} $$

for *x* ≠ *x\**. The difference between the lower and upper bounds of this inequality for *x* = 1, 2 is inequality (4.1).

Inequality (4.3) can be also derived using equation (8.1). Assumption 2.1 implies that E(*YX*=2|*X* = *x*, *R* = *r*) ≥ E(*YX*=1|*X* = *x*, *R* = *r*). Thus, by substituting this inequality with *x* = 1 into equation (8.1), we obtain:

$$\begin{aligned} \operatorname{E}(Y\_{X\sim 2}) &= \operatorname{E}(Y\_{X\sim 2} \mid R=r) \\ &\ge \sum\_{x=l,2} \operatorname{E}(Y\_{X\sim x} \mid X=x, R=r) \operatorname{Pr}(X=x \mid R=r) \\ &= \sum\_{x=l,2} \operatorname{E}(Y \mid X=x, R=r) \operatorname{Pr}(X=x \mid R=r) \\ &= \operatorname{E}(Y \mid R=r), \end{aligned} \tag{8.3}$$

and thus E(*YX*=2) ≥ max{E(*Y*|*R* = 1), E(*Y*|*R* = 2)}. Similarly, E(*YX*=1) ≤ min{E(*Y*|*R* = 1), E(*Y*|*R* = 2)} by substituting E(*YX*=2|*X* = 2, *R* = *r*) ≥ E(*YX*=1|*X* = 2, *R* = *r*) into equation (8.1). The difference between them is inequality (4.3).

Causal Inference in Randomized Trials with Noncompliance 333

The second equation holds because E(*YX*=2|*X* = 1, *R* = *r*, *Z* = *z*) = E(*YX*=2|*X* = 2, *R* = *r*, *Z* = *z*) = E(*Y*|*X* = 2, *R* = *r*, *Z* = *z*) by the independency and consistency assumptions. The fourth inequality holds because 1 – *gr*(*Z*) is non-increasing when *gr*(*Z*) is non-decreasing. The fifth

).|2Pr()|Pr(),|2Pr()}({E <sup>|</sup> *Zg rRXrRzZzZrRX*

*<sup>F</sup> <sup>r</sup> rRZ*

A similar calculation derives E(*YX*=1|*X* = 2, *R* = *r*) ≥ E(*Y*|*X* = 1, *R* = *r*). The inequalities derived here are the same as those in Assumption 4.1. Therefore, inequality (4.4) can be

E(*YX*=*<sup>x</sup>*) can be expressed as E(*YX*=*<sup>x</sup>*) = E(*YX*=*<sup>x</sup>*|*R* = 1)Pr(*R* = 1) + E(*YX*=*<sup>x</sup>*|*R* = 2)Pr(*R* = 2).

 E(*YX*=*<sup>x</sup>*|*R* = 1) ≤ E(*YX*=*<sup>x</sup>*) ≤ E(*YX*=*<sup>x</sup>*|*R* = 2) (8.5) under Assumption 6.1 (MIV: E(*YX*=*<sup>x</sup>*|*R* = 1) ≥ E(*YX*=*<sup>x</sup>*|*R* = 0)). All bounds under the MIV are derived based on inequality (8.5), while those under the IV (Assumption 1) are based on E(*YX*=*<sup>x</sup>*) = E(*YX*=*<sup>x</sup>*|*R* = *r*). This is why inequality (6.1) corresponds to it when *a* or *b* in max{*a*, *b*} and min{*a*, *b*} in inequality (4.1) is used. This is also similar under the RMIV (Assumption 6.2), and then inequality (6.2) and the bounds in Tables 3 and 4 also correspond to those when *a* or *b* in max{*a*, *b*} and min{*a*, *b*} in the bounds presented in Section 4.1 are used. Therefore, the derivations of bounds in Section 6.1 are simple. Inequality (6.1) is derived by

In Table 3, ACE ≥ max{−ITT, 0} under the MIV and MTR is derived as follows. Because E(*YX*=2) ≥ E(*YX*=2|*R* = 1) from inequality (8.5), E(*YX*=2) ≥ E(*Y*|*R* = 1) from inequality (8.3) with *r* = 1. Likewise, E(*YX*=1) ≤ E(*Y*|*R* = 2) by E(*YX*=1) ≤ E(*YX*=1|*R* = 2) and the MTR (Assumption 2.1). The difference between these inequalities derives ACE ≥ −ITT. Additionally, the MTR derives ACE = E(*YX*=1) − E(*YX*=0) ≥ 0 directly. The other bounds in Table 3 can be derived in a similar way. In Table 4, ACE ≤ *E*22 – *E*11 under the MIV and MTS is derived as follows. Because E(*YX*=2) ≤ E(*YX*=2|*R* = 2) from inequality (8.5), E(*YX*=2) ≤ *E*22 from inequality (8.4) with *r* = 2. Likewise, E(*YX*=1) ≥ *E*11 by E(*YX*=1) ≥ E(*YX*=1|*R* = 1) and the MTS (Assumption 4.1). The difference between these inequalities derives ACE ≤ *E*22 – *E*11. The other bounds in Table 4

The inequalities in Section 6.2 can be derived in straightforward manner as the derivations of those in Section 6.1 by replacing *x* = 1, 2 in Section 6.1 to *x* = 0, 1 and *x* = 0, 2, although

This work was supported partially by Grant-in-Aid for Scientific Research (No. 23700344)

Angrist, J.D.; Imbens, G.W. & Rubin, D.B. (1996). Identification of causal effects using

instrumental variables (with discussions). *Journal of the American Statistical* 

from the Ministry of Education, Culture, Sports, Science, and Technology of Japan.

*Association,* Vol.91, No.434, (June 1996), pp.444-472, ISSN 0162-1459

and sixth equations hold because:

derived under Assumption 5.1.

Therefore,

*z*

**8.3 Derivations of inequalities in Section 6** 

the combination of inequalities (8.2) and (8.5).

can be derived in a similar way.

they may be somewhat complex.

**9. Acknowledgment** 

**10. References** 

In the case of a binary outcome variable, inequality (4.3) can also be derived under Assumption 3.1. By adding Pr(*YX=*2 = 1, *YX=*1 = 1|*X* = *x*, *R* = *r*) on both sides of the inequality in Assumption 3.1: Pr(*YX=*2 = 1, *YX=*1 = 0|*X* = *x*, *R* = *r*) ≥ Pr(*YX=*2 = 0, *YX=*1 = 1|*X* = *x*, *R* = *r*), we obtain Pr(*YX=*2 = 1|*X* = *x*, *R* = *r*) ≥ Pr(*YX=*1 = 1|*X* = *x*, *R* = *r*). Because this inequality is a binary outcome version of E(*YX*=2|*X* = *x*, *R* = *r*) ≥ E(*YX*=1|*X* = *x*, *R* = *r*), inequality (4.3) is derived.

Inequality (4.4) can be derived as follows. Substituting E(*YX*=2|*X* = 2, *R* = *r*) ≥ E(*YX*=2|*X* = 1, *R* = *r*) (*x* = 2 and (*s*, *t*) = (2, 1) in Assumption 4.1) into equation (8.1) yields:

$$\begin{split} \operatorname{E}(Y\_{X=2}) &= \operatorname{E}(Y\_{X=2} \mid R=r) \\ &\le \sum\_{x=1,2} \operatorname{E}(Y\_{X=2} \mid X=2, R=r) \operatorname{Pr}(X=x \mid R=r) \\ &= \operatorname{E}(Y \mid X=2, R=r) (=E\_{2,r}), \end{split} \tag{8.4}$$

and thus E(*YX*=2) ≤ min{*E*21, *E*22}. Similarly, E(*YX*=1) ≥ max{*E*11, *E*12} by substituting E(*YX*=1|*X* = 2, *R* = *r*) ≥ E(*YX*=1|*X* = 1, *R* = *r*) (*x* = 1 and (*s*, *t*) = (2, 1) in Assumption 4.1) into equation (8.1). The difference between them is inequality (4.4).

Inequality (4.4) can also be derived under Assumption 5.1. To prove this, we need the following lemma (Esary et al., 1967):

#### LEMMA 1

*Let f and g be functions with n real-valued arguments such that both f and g are non-decreasing or non-*

*increasing in each of their arguments. If Z = (Z*1*, …, Zn) is a multivariate random variable with n components such that each component is independent of the other components, then* Cov{*f*(*Z*)*, g*(*Z*)} *≥* 0*.*

Let *fr*(*Z*) = E(*Y*|*X* = 2, *R* = *r*, *Z* = *z*), *gr*(*Z*) = Pr(*X* = 2|*R* = *r*, *Z* = *z*) and *FZ*|*R*=*<sup>r</sup>* denote the cumulative distribution function of *Z* conditional on *R* = *r*. Then, by Lemma 1, we obtain:

$$\operatorname{E}\_{\operatorname{F}\_{\operatorname{Pl}^{\operatorname{div}}}}\left\{f\_{r}(\mathbf{Z})\mathbf{g}\_{r}(\mathbf{Z})\right\} - \operatorname{E}\_{\operatorname{F}\_{\operatorname{Pl}^{\operatorname{div}}}}\left\{f\_{r}(\mathbf{Z})\right\}\operatorname{E}\_{\operatorname{F}\_{\operatorname{Pl}^{\operatorname{div}}}}\left\{\mathbf{g}\_{r}(\mathbf{Z})\right\} = \operatorname{Cov}\_{\operatorname{F}\_{\operatorname{Pl}^{\operatorname{div}}}}\left\{f\_{r}(\mathbf{Z}), \mathbf{g}\_{r}(\mathbf{Z})\right\} \ge 0,$$

if both *fr*(*Z*) and *gr*(*Z*) are non-decreasing or non-increasing in *z* and the components of *Z* are independent. Thus, using the assumption that *YX*=*<sup>x</sup>* is independent from *X* given *R* and *Z*, the following inequality is derived:

).,2|(E )|2Pr( )|Pr(),|2Pr(),,2|(E )|2Pr()}()({E )|2Pr()}({E)}({E )}({E )|1Pr()}(1{E)}({E )|1Pr()}](1){([E )|1Pr( )|Pr(),|1Pr(),,2|(E ,1|(E),1|(E ),1|Pr(), <sup>2</sup> <sup>2</sup> *rRXY rRX rRzZzZrRXzZrRXY rRXZgZf rRXZgZf Zf Zf rRXZg rRXZgZf rRX rRzZzZrRXzZrRXY rRXzZzZrRXYrRXY z F rr F r F r F r F r F r rF r z z X X rRZ rRZ rRZ rRZ rRZ rRZ rRZ* 

In the case of a binary outcome variable, inequality (4.3) can also be derived under Assumption 3.1. By adding Pr(*YX=*2 = 1, *YX=*1 = 1|*X* = *x*, *R* = *r*) on both sides of the inequality in Assumption 3.1: Pr(*YX=*2 = 1, *YX=*1 = 0|*X* = *x*, *R* = *r*) ≥ Pr(*YX=*2 = 0, *YX=*1 = 1|*X* = *x*, *R* = *r*), we obtain Pr(*YX=*2 = 1|*X* = *x*, *R* = *r*) ≥ Pr(*YX=*1 = 1|*X* = *x*, *R* = *r*). Because this inequality is a binary outcome version of E(*YX*=2|*X* = *x*, *R* = *r*) ≥ E(*YX*=1|*X* = *x*, *R* = *r*), inequality (4.3) is derived. Inequality (4.4) can be derived as follows. Substituting E(*YX*=2|*X* = 2, *R* = *r*) ≥ E(*YX*=2|*X* = 1,

),)(,2|(E

and thus E(*YX*=2) ≤ min{*E*21, *E*22}. Similarly, E(*YX*=1) ≥ max{*E*11, *E*12} by substituting E(*YX*=1|*X* = 2, *R* = *r*) ≥ E(*YX*=1|*X* = 1, *R* = *r*) (*x* = 1 and (*s*, *t*) = (2, 1) in Assumption 4.1) into equation (8.1).

Inequality (4.4) can also be derived under Assumption 5.1. To prove this, we need the

LEMMA 1 *Let f and g be functions with n real-valued arguments such that both f and g are non-decreasing or nonincreasing in each of their arguments. If Z = (Z*1*, …, Zn) is a multivariate random variable with n components such that each component is independent of the other components, then* Cov{*f*(*Z*)*, g*(*Z*)} *≥* 0*.* Let *fr*(*Z*) = E(*Y*|*X* = 2, *R* = *r*, *Z* = *z*), *gr*(*Z*) = Pr(*X* = 2|*R* = *r*, *Z* = *z*) and *FZ*|*R*=*<sup>r</sup>* denote the cumulative distribution function of *Z* conditional on *R* = *r*. Then, by Lemma 1, we obtain:

> ,0)}(),({Cov)}({E)}({E)}()({E *ZgZf ZgZf ZgZf rrF rF rF rrF rRZ rRZ rRZ rRZ*

if both *fr*(*Z*) and *gr*(*Z*) are non-decreasing or non-increasing in *z* and the components of *Z* are independent. Thus, using the assumption that *YX*=*<sup>x</sup>* is independent from *X* given *R* and *Z*,

)|2Pr()}()({E

)|2Pr()}({E)}({E

*rRXZgZf*

*Zf rRXZg*

*rRXZgZf*

)|1Pr()}(1{E)}({E

*rRXZgZf*

)|1Pr()}](1){([E

,1|(E),1|(E ),1|Pr(), <sup>2</sup> <sup>2</sup>

*rRXzZzZrRXYrRXY*

)|2Pr( )|Pr(),|2Pr(),,2|(E

*rRX rRzZzZrRXzZrRXY*

)|1Pr( )|Pr(),|1Pr(),,2|(E

*rRX rRzZzZrRXzZrRXY*

).,2|(E

*z*

*z*

*z*

*rRXY*

)}({E

*F r*

*rRZ*

*rRZ*

*rRZ*

*Zf*

*rF r*

*rRZ rRZ*

*F r F r*

*rRZ rRZ*

*F r F r*

*F rr*

*ErRXY*

2

*X*

)|Pr(),2|(E

*rRxXrRXY*

2

*r*

(8.4)

*R* = *r*) (*x* = 2 and (*s*, *t*) = (2, 1) in Assumption 4.1) into equation (8.1) yields:

)|(E)(E

*rRYY*

2 2

*X X*

The difference between them is inequality (4.4).

following lemma (Esary et al., 1967):

the following inequality is derived:

*X X*

*x*

2,1

The second equation holds because E(*YX*=2|*X* = 1, *R* = *r*, *Z* = *z*) = E(*YX*=2|*X* = 2, *R* = *r*, *Z* = *z*) = E(*Y*|*X* = 2, *R* = *r*, *Z* = *z*) by the independency and consistency assumptions. The fourth inequality holds because 1 – *gr*(*Z*) is non-increasing when *gr*(*Z*) is non-decreasing. The fifth and sixth equations hold because:

$$\operatorname{\mathbf{E}}\_{F\_{Z|R=r}}\left\{\mathbf{g}\_r(Z)\right\} = \sum\_z \operatorname{\mathbf{Pr}}(X=\mathcal{Z}\mid R=r, Z=z) \operatorname{\mathbf{Pr}}(Z=z\mid R=r) = \operatorname{\mathbf{Pr}}(X=\mathcal{Z}\mid R=r).$$

A similar calculation derives E(*YX*=1|*X* = 2, *R* = *r*) ≥ E(*Y*|*X* = 1, *R* = *r*). The inequalities derived here are the same as those in Assumption 4.1. Therefore, inequality (4.4) can be derived under Assumption 5.1.

#### **8.3 Derivations of inequalities in Section 6**

E(*YX*=*<sup>x</sup>*) can be expressed as E(*YX*=*<sup>x</sup>*) = E(*YX*=*<sup>x</sup>*|*R* = 1)Pr(*R* = 1) + E(*YX*=*<sup>x</sup>*|*R* = 2)Pr(*R* = 2). Therefore,

$$\operatorname{E}(Y\_{\lambda \vdash \pi} \mid R=\mathbf{1}) \le \operatorname{E}(Y\_{\lambda \vdash \pi}) \le \operatorname{E}(Y\_{\lambda \vdash \pi} \mid R=\mathbf{2}) \tag{8.5}$$

under Assumption 6.1 (MIV: E(*YX*=*<sup>x</sup>*|*R* = 1) ≥ E(*YX*=*<sup>x</sup>*|*R* = 0)). All bounds under the MIV are derived based on inequality (8.5), while those under the IV (Assumption 1) are based on E(*YX*=*<sup>x</sup>*) = E(*YX*=*<sup>x</sup>*|*R* = *r*). This is why inequality (6.1) corresponds to it when *a* or *b* in max{*a*, *b*} and min{*a*, *b*} in inequality (4.1) is used. This is also similar under the RMIV (Assumption 6.2), and then inequality (6.2) and the bounds in Tables 3 and 4 also correspond to those when *a* or *b* in max{*a*, *b*} and min{*a*, *b*} in the bounds presented in Section 4.1 are used. Therefore, the derivations of bounds in Section 6.1 are simple. Inequality (6.1) is derived by the combination of inequalities (8.2) and (8.5).

In Table 3, ACE ≥ max{−ITT, 0} under the MIV and MTR is derived as follows. Because E(*YX*=2) ≥ E(*YX*=2|*R* = 1) from inequality (8.5), E(*YX*=2) ≥ E(*Y*|*R* = 1) from inequality (8.3) with *r* = 1. Likewise, E(*YX*=1) ≤ E(*Y*|*R* = 2) by E(*YX*=1) ≤ E(*YX*=1|*R* = 2) and the MTR (Assumption 2.1). The difference between these inequalities derives ACE ≥ −ITT. Additionally, the MTR derives ACE = E(*YX*=1) − E(*YX*=0) ≥ 0 directly. The other bounds in Table 3 can be derived in a similar way. In Table 4, ACE ≤ *E*22 – *E*11 under the MIV and MTS is derived as follows. Because E(*YX*=2) ≤ E(*YX*=2|*R* = 2) from inequality (8.5), E(*YX*=2) ≤ *E*22 from inequality (8.4) with *r* = 2. Likewise, E(*YX*=1) ≥ *E*11 by E(*YX*=1) ≥ E(*YX*=1|*R* = 1) and the MTS (Assumption 4.1). The difference between these inequalities derives ACE ≤ *E*22 – *E*11. The other bounds in Table 4 can be derived in a similar way.

The inequalities in Section 6.2 can be derived in straightforward manner as the derivations of those in Section 6.1 by replacing *x* = 1, 2 in Section 6.1 to *x* = 0, 1 and *x* = 0, 2, although they may be somewhat complex.

#### **9. Acknowledgment**

This work was supported partially by Grant-in-Aid for Scientific Research (No. 23700344) from the Ministry of Education, Culture, Sports, Science, and Technology of Japan.

#### **10. References**

Angrist, J.D.; Imbens, G.W. & Rubin, D.B. (1996). Identification of causal effects using instrumental variables (with discussions). *Journal of the American Statistical Association,* Vol.91, No.434, (June 1996), pp.444-472, ISSN 0162-1459

Causal Inference in Randomized Trials with Noncompliance 335

Fisher, L.D.; Dixon, D.O.; Herson, J.; Frankowski, R.; Hearron, M. & Peace, K.E. (1990).

Frangakis, C.E. & Rubin, D.B. (2002). Principal stratification in causal inference. *Biometrics*

Greenland, S. (2000). An introduction to instrumental variables for epidemiologists.

Greenland, S. & Robins, J.M. (1986). Identifiability, exchangeability and epidemiologic

Hernán, M.A. & Robins, J.M. (2006). Instruments for causal inference: An epidemiologist's dream? *Epidemiology,* Vol.17, No.4, (July 2006), pp.360–372, ISSN 1044-3983 Holland, P.W. (1986). Statistics and causal inference (with discussions). *Journal of the* 

Joffe, M.; Small, D. & Hsu, C.-Y. (2007). Defining and estimating intervention effects for

Lee, Y.; Ellenberg, J.; Hirtz, D. & Nelson, K. (1991). Analysis of clinical trials by treatment

Lewis, J.A. & Machine, D. (1993). Intention to treat – who should use ITT? *British Journal of* 

Manski, C. F. (1990). Nonparametric bounds on treatment effects. *American Economic Review*,

Manski, C.F. (1997). Monotone treatment response. *Econometrica,* Vol.65, No.6, (November

Manski, C.F. (2003). *Partial identification of probability distributions*, Springer-Verlag, ISBN 0-

Manski, C.F. & Pepper, J.V. (2000). Monotone instrumental variables: With an application to

Manski, C.F. & Pepper, J.V. (2009). More on monotone instrumental variables. *Econometrics Journal,* Vol.12, No.S1, (January 2009), pp.S200-S216, ISSN 1368-4221 Mark, S.D. & Robins, J.M. (1993). A method for the analysis of randomized trials with

Multiple Risk Factor Intervention Trial Research Group (1982). Multiple risk factor

Piantadosi, S. (1997). *Clinical Trials: A Methodologic Perspective*, Wiley, ISBN 0-471-16393-7,

Vol.61, No.3, (September 2005), pp.816-823, ISSN 0006-341X

the returns to schooling. *Econometrica,* Vol.68, No.4, (July 2000), pp.997-1010, ISSN

noncompliance information: An application to the multiple risk factor intervention trial. *Controlled Clinical Trials,* Vol.14, No.2, (April 1993), pp.79-97, ISSN 1551-7144 Matsui, S. (2005). Stratified analysis in randomized trials with noncompliance. *Biometrics,*

intervention trial: Risk factor changes and mortality results. *Journal of the American Medical Association*, Vol.248, No.12, (September 1982), pp.1465-1477, ISSN 0098-7484

*Cancer,* Vol.68, No.4, (October 1993), pp.647-650, ISSN 0007-0920

Vol.58, No.1, (March 2002), pp.21-29, ISSN 0006-341X

(February 2007), pp.74-97, ISSN 0883-4237

(October 1991), pp.1595-1605, ISSN 1097-0258

1997), pp.1311-1334, ISSN 0012-9682

387-00454-8, New York, USA

Vol.80, No.2, (May 1990), pp.319-323, ISSN 0002-8282

York, USA

0300-5771

0162-1459

0012-9682

New York, USA

419, ISSN 0300-5771

Intention to treat in clinical trials, In: *Statistical Issues in Drug Research and Development*, K.E. Peace (Ed.), 331-350, Marcel Dekker, ISBN 0-8247-8290-9, New

*International Journal of Epidemiology*, Vol.29, No.4, (August 2000), pp.722-729, ISSN

confounding. *International Journal of Epidemiology,* Vol.15, No.3, (June 1986), pp.413-

*American Statistical Association,* Vol.81, No.396, (December 1986), pp.945-970, ISSN

groups that will develop an auxiliary outcome. *Statistical Science,* Vol.22, No.1,

actually received: Is it really an option? *Statistics in Medicine,* Vol.10, No.10,


Balke, A. & Pearl, J. (1997). Bounds on treatment effects from studies with imperfect

Brumback, B.A.; Hernán, M.A.; Haneuse, S.J.P.A. & Robins, J.M. (2004). Sensitivity analyses

Cai, Z.; Kuroki, M. & Sato, T. (2007). Non-parametric bounds on treatment effects with non-

Cheng, J. & Small, D.S. (2006). Bounds on causal effects in three-arm trials with non-

Chiba, Y. (2009a). The sign of the unmeasured confounding bias under various standard

Chiba, Y. (2009b). Bounds on causal effects in randomized trials with noncompliance under

Chiba, Y. (2010a). Bias analysis of the instrumental variable estimator as an estimator of the

Chiba, Y. (2010b). An approach for estimating causal effects in randomized trials with

Chiba, Y. (2010c). The monotone instrumental variable in randomized trials with

Chiba, Y. (2011). An alternative assumption for assessing the sign of causal effects. *Oriental Journal of Statistical Methods, Theory and Applications,* in press, ISSN Awaited Chiba, Y.; Sato, T. & Greenland, S. (2007). Bounds on potential risks and causal risk

*Medicine,* Vol.26, No.28, (December 2007), pp. 5125-5135, ISSN 1097-0258 Chiba, Y. & VanderWeele, T.J. (2011). A simple method for principal strata effects when the

Coronary Drug Project Research Group (1980). Influence of adherence to treatment and

Esary, J.D.; Proschan, F. & Walkup, D.W. (1967). Association of random variables, with

1997), pp.1171-1176, ISSN 0162-1459

2007), pp.3188-3204, ISSN 1097-0258

2006), pp.815-836, ISSN 0964-1998

pp.12-17, ISSN 1551-7144

106, ISSN 0918-4430

1474, ISSN 0003-4851

(November 2009), pp.3249-3259, ISSN 1097-0258

(January 2010), pp.2146-2156, ISSN 0361-0926

No.7, (April 2011), pp.745-751, ISSN 0002-9262

(May 1997), pp.1017-1029, ISSN 1097-0258

0258

0323-3847

compliance. *Journal of the American Statistical Association*, Vol.92, No.439, (September

for unmeasured confounding assuming a marginal structural model for repeated measures. *Statistics in Medicine,* Vol.23, No.5, (March 2004), pp.749–767, ISSN 1097-

compliance by covariate adjustment. *Statistics in Medicine,* Vol.26, No.16, (July

compliance. *Journal of the Royal Statistical Society, Series B,* Vol.68, No.5, (November

populations. *Biometrical Journal,* Vol.51, No.4, (August 2009), pp. 670-676, ISSN

monotonicity assumptions about covariates. *Statistics in Medicine,* Vol.28, No.26,

average causal effect. *Contemporary Clinical Trials*, Vol.31, No.1, (January 2010),

noncompliance. *Communications in Statistics – Theory and Methods,* Vol.39, No.12,

noncompliance. *Japanese Journal of Biometrics,* Vol.31, No.2, (December 2010), pp.93-

differences under assumptions about confounding parameters. *Statistics in* 

outcome has been truncated due to death. *American Journal of Epidemiology,* Vol.173,

response of cholesterol on mortality in the coronary drug project. *New England Journal of Medicine*, Vol.303, No.18, (October 1980), pp.1038-1041, ISSN 0028-4793 Cuzick, J.; Edwards, R. & Segnan, N. (1997). Adjustment for non-compliance and

contamination in randomized clinical trials. *Statistics in Medicine,* Vol.16, No.9,

applications. *Annals of Mathematical Statistics,* Vol.38, No.5, (October 1967), pp.1466-


**19** 

*Italy* 

**Design of Scoring Models for** 

*Department of Surgery and Bioengineering, University of Siena* 

Paolo Barbini and Gabriele Cevenini

**Trustworthy Risk Prediction in Critical Patients** 

Prediction of an adverse health event (AHE) from objective data is of great importance in clinical practice. A health event is inherently dichotomous as it either happens or does not

In many clinical applications, it is relevant not only to predict AHEs happening (diagnostic ability) but also to estimate in advance their individual risk of occurrence using ordered multinomial or quantitative scales (prognostic ability) such as probability. An estimated probability of a patient's outcome is usually preferred to a simpler binary decision rule. However, models cannot be designed by optimising their fit to true individual risk probabilities because the latter are not intrinsically known, nor can they be easily and accurately associated with an individual's data. Classification models are therefore usually trained on binary outcomes to provide an orderable or quantitative output, which can be

Model discrimination refers to accurate identification of actual outcomes. Model calibration, or goodness of fit, is related to the agreement between predicted probabilities and observed proportions and it is an important aspect to consider in evaluating the prognostic capacity of a risk model (Cook, 2008). Model calibration is independent of discrimination, since there are risk models with good discrimination but poor calibration. A well-calibrated model gives probability values that can be reliably associated with the true individual risk of

Many models have recently been proposed for diagnostic purposes in a wide range of medical applications and they also provide reliable estimates of individual risk probabilities. Two different approaches have been used to predict patient risk. The first approach is based on estimation of risk probability by sophisticated mathematical and statistical methods, such as logistic regression, the Bayesian rule and artificial neural networks (Dreiseitl & Ohno-Machado, 2002; Fukunaga, 1990; Marshall et al., 1994). Despite their great accuracy, these models are unfortunately not widely used because they are hard to design and call for difficult calculations, often requiring dedicated software and computing knowledge that doctors do not welcome, besides being difficult to incorporate in clinical practice. The second approach creates scoring systems, in which the predictor variables are usually selected and scored subjectively by expert consensus or objectively using statistical methods (den Boer et al., 2005; Higgins et al., 1997; Vincent &

happen, and in the latter case, it is a favourable health event (FHE).

dichotomised using a suitable cut-off value.

**1. Introduction** 

outcomes.

Moreno, 2010).

