A. Appendix A

practice because h0ð Þ: usually is considered as unspecified in order to avoid the impact of

Figure 7. Kaplan-Meier estimates of the survival function from observed failure times, from model 1 and from model 2 (left panel). Observed longitudinal trajectories (the solid line) and predicted longitudinal trajectories (the dashed line) for

Based on the model of longitudinal regression in (4.2), we also draw the smooth and predicted longitudinal profiles for nine patients from the AIDS dataset as depicted in Figure 7 (right panel). The dot points are the true observed longitudinal values. The solid lines are the smooth longitudinal profiles using the loess smoother and the dashed lines are the predicted profiles of nine randomly selected individuals. Most of the predicted profiles are quite close to the

In this chapter, two joint models using a penalized spline with a truncated polynomial basis have been proposed to model a non-linear longitudinal outcome and a time-to-event data. The use of a truncated polynomial basis gives us an intuitive and obvious way to model non-linear longitudinal outcome. By adding some penalties for the coefficients of the knots and using linear mixed-

We have conducted a sensitivity analysis on the assumption of normality for either random effects or errors. The t-distribution with the degree of freedom 5 is applied for each of them. The results show that the estimates of parameters are sensitive when both of terms are not

The main findings we may derive from this chapter are, at least, threefold: (1) the ECM algorithm provides a reasonable quick convergence algorithm for the proposed models; (2) the fitted joint models are able to measure the association between the internal time-dependent

effects models, the smoothing is controlled and the individual curves are specified.

misspecifying the distribution of survival times.

the 12 randomly selected patients (right panel).

122 Topics in Splines and Applications

observed ones.

5. Discussion

normally distributed.

One sample of simulated data of the penalized spline joint model in (4.4) is presented in Table 4 for the first three patients. The subjects are measured bimonthly and the entry time is 0 for all


Table 4. A snapshot of simulated data for penalized spline joint model in (4.4).

subjects. Obstime variable includes the time points at which these measurements are recorded. Time variable includes the observed survival times when subject meets an event. x is a timeconstant binary random variable with parameter p ¼ 0:5. Column y contains the longitudinal responses. Death variable is the event status indicator. This variable receives value 1 when the true survival time is less than or equal to the censoring time and 0 otherwise. We define the four random effects variables which are <sup>Z</sup><sup>1</sup> <sup>¼</sup> ð Þ obstime � <sup>K</sup><sup>1</sup> <sup>þ</sup>, <sup>Z</sup><sup>2</sup> <sup>¼</sup> ð Þ obstime � <sup>K</sup><sup>2</sup> <sup>þ</sup>, <sup>Z</sup><sup>3</sup> <sup>¼</sup> <sup>ð</sup>obstime �K3Þþ and <sup>Z</sup><sup>4</sup> <sup>¼</sup> <sup>1</sup>. For the longitudinal process, there are 1902 of observations for 500 subjects. For each subject, 1-7 longitudinal measurements are recorded. On average, there are four longitudinal measurements per subject. For the event process, there are 297 subjects who meet for an event which is equivalent to 59.4% of the whole sample.

where Sð Þ θ is the score vector corresponding to parameter θ and the score vector has the form of

<sup>j</sup>bi; <sup>θ</sup>ð Þ it <sup>Þ</sup>pðbi; <sup>θ</sup>ð Þ it <sup>Þ</sup> n o:<sup>p</sup> <sup>b</sup>ijTi; <sup>δ</sup>i; <sup>y</sup><sup>i</sup>

Penalized Spline Joint Models for Longitudinal and Time-To-Event Data

SiðÞ¼ t exp ð Þ¼ �Hið Þt 1 � Fið Þt : (28)

ðt

0 @

hið Þs ds

0

h sð Þds:

ðt

hið Þs ds

9 >=

3 7 5,

>;

K<sup>1</sup>

u ¼ 1 � Fið Þ Ti , (29)

1 A:

> K ð1

> > 0

hið Þs ds þ

ðt

hið Þs ds.

K<sup>1</sup>

; <sup>θ</sup>ð Þ it � �dbi:

http://dx.doi.org/10.5772/intechopen.75975

0

hið Þs ds.

125

Sð Þ¼ θ

C. Appendix C

Following (5.4), we set

The condition t < K<sup>1</sup> is equal to

<sup>¼</sup> <sup>X</sup> i

<sup>∂</sup><sup>Q</sup> <sup>θ</sup>jθð Þ it � � ∂θ

ð ∂

and cumulative distribution Fið Þt , we have

<sup>∂</sup>θ<sup>T</sup> log p Ti; <sup>δ</sup>ijbi; <sup>θ</sup>ð Þ it � �pðy<sup>i</sup>

There are four cases for simulating survival time Ti of the model (4.1) as follows.

When the survival time <sup>t</sup> <sup>&</sup>lt; <sup>K</sup>1, we calculate the cumulative hazard function HiðÞ¼ <sup>t</sup> <sup>Ð</sup><sup>t</sup>

Based on the relation between the survival function Sið Þt , cumulative hazard function Hið Þt

where u is a random variable with u � Uni½ � 0; 1 . The survival time t is the solution of the equation

K ð1

0

hið Þs ds þ

U ¼ exp ð Þ¼ �Hið Þt exp �

� log ð Þ U <

K ð1

8 ><

>:

0

When K<sup>1</sup> ≤ t < K2, we calculate the cumulative hazard function HiðÞ¼ t

2 6 4

U ¼ exp �

where U is a value of u � Uni½ � 0; 1 . The condition K<sup>1</sup> ≤ t < K<sup>2</sup> is equal to

The survival time t is the solution of the equation

## B. Appendix B

The integrals with respect to the random effects in (3.7) do not have closed-form solutions. Therefore, in this chapter, we implement the Gaussian-Hermite quadrature rule as in Rizopoulos [5] to approximate the integrals. In our simulation study and R coding, we use the Gaussian-Hermite quadrature rule with 10 quadrature points.

The updating formulas of the parameters in Step 3 have different forms for each parameter following Rizopoulos [2]. We have the closed-form estimates for the measurement error variance σ<sup>2</sup> <sup>ε</sup> in the longitudinal model and the covariance matrix of the random effects as follows:

$$\hat{\mathbf{G}}^{(it+1)} = \frac{1}{n} \sum\_{i} \int \mathbf{b}\_{i}^{T} \mathbf{b}\_{i} p\left(\mathbf{b}\_{i} | T\_{i}, \delta\_{i}, \mathbf{y}\_{i}; \boldsymbol{\Theta}^{(it)}\right) d\mathbf{b}\_{i} = \frac{1}{N} \sum\_{i} v \tilde{\mathbf{b}}\_{i}^{(it)} + \tilde{\mathbf{b}}\_{i}^{(it)} \tilde{\mathbf{b}}\_{i}^{(it)} T\_{i} \tag{25}$$

where ~ b<sup>i</sup> ¼ E bijTi; δi; y<sup>i</sup> ; <sup>θ</sup> � � <sup>¼</sup> <sup>Ð</sup> bip bijTi; δi; y<sup>i</sup> ; <sup>θ</sup> � �db<sup>i</sup> and <sup>~</sup>vb<sup>i</sup> <sup>¼</sup> var <sup>b</sup>ijTi; <sup>δ</sup>i; <sup>y</sup><sup>i</sup> ; <sup>θ</sup> � � <sup>¼</sup> <sup>Ð</sup> ðbi� ~ biÞp bijTi; δi; y<sup>i</sup> ; θ � �dbi. The updating formula for σ<sup>2</sup> <sup>ε</sup> is

$$\widehat{\sigma}\_{\varepsilon(it+1)}^2 = \frac{1}{n} \sum\_{i} \int \mathbf{W}^T \mathbf{W} p\left(\mathfrak{b}\_i | T\_i, \delta\_i, \mathfrak{y}\_i; \mathfrak{G}^{(it)}\right) d\mathfrak{b}\_{i\cdot} \tag{26}$$

where W ¼ y<sup>i</sup> � Xiβ � Xiu<sup>i</sup> � Zivi.

Unfortunately, we cannot obtain closed-form expressions for the fixed effects β and the parameters of the survival submodel γ, α, and θ<sup>h</sup><sup>0</sup> . We thus employ the one-step Newton-Raphson approach to obtain the updated βð Þ itþ<sup>1</sup> , γð Þ itþ<sup>1</sup> ,αð Þ itþ<sup>1</sup> and θð Þ itþ<sup>1</sup> <sup>h</sup><sup>0</sup> . In particular, we have

$$\begin{aligned} S(\boldsymbol{\theta}) &= \frac{\partial \mathcal{Q}\Big(\boldsymbol{\theta} | \boldsymbol{\theta}^{(i)}\Big)}{\partial \boldsymbol{\theta}}\\ \widehat{\boldsymbol{\theta}}^{(i+1)} &= \widehat{\boldsymbol{\theta}}^{(i)} - \left[\frac{\partial \mathcal{S}\Big(\widehat{\boldsymbol{\theta}}^{(i)}\Big)}{\partial \boldsymbol{\theta}}\right]^{-1} \mathcal{S}\Big(\widehat{\boldsymbol{\theta}}^{(i)}\Big), \end{aligned} \tag{27}$$

where Sð Þ θ is the score vector corresponding to parameter θ and the score vector has the form of

$$\begin{split} S(\boldsymbol{\theta}) &= \frac{\partial Q\left(\boldsymbol{\theta}|\boldsymbol{\theta}^{(i)}\right)}{\partial \boldsymbol{\theta}} \\ &= \sum\_{i} \int \frac{\partial}{\partial \boldsymbol{\theta}^{T}} \log \left\{ p\left(T\_{i}, \delta\_{i}|\boldsymbol{b}\_{i}; \boldsymbol{\theta}^{(i)}\right) p(\boldsymbol{y}\_{i}|\boldsymbol{b}\_{i}; \boldsymbol{\theta}^{(i)}) p(\boldsymbol{b}\_{i}; \boldsymbol{\theta}^{(i)}) \right\} \cdot p\left(\boldsymbol{b}\_{i}|T\_{i}, \delta\_{i}, \boldsymbol{y}\_{i}; \boldsymbol{\theta}^{(i)}\right) d\boldsymbol{b}\_{i} .\end{split}$$

## C. Appendix C

subjects. Obstime variable includes the time points at which these measurements are recorded. Time variable includes the observed survival times when subject meets an event. x is a timeconstant binary random variable with parameter p ¼ 0:5. Column y contains the longitudinal responses. Death variable is the event status indicator. This variable receives value 1 when the true survival time is less than or equal to the censoring time and 0 otherwise. We define the four random effects variables which are <sup>Z</sup><sup>1</sup> <sup>¼</sup> ð Þ obstime � <sup>K</sup><sup>1</sup> <sup>þ</sup>, <sup>Z</sup><sup>2</sup> <sup>¼</sup> ð Þ obstime � <sup>K</sup><sup>2</sup> <sup>þ</sup>, <sup>Z</sup><sup>3</sup> <sup>¼</sup> <sup>ð</sup>obstime �K3Þþ and <sup>Z</sup><sup>4</sup> <sup>¼</sup> <sup>1</sup>. For the longitudinal process, there are 1902 of observations for 500 subjects. For each subject, 1-7 longitudinal measurements are recorded. On average, there are four longitudinal measurements per subject. For the event process, there are 297 subjects who meet for an

The integrals with respect to the random effects in (3.7) do not have closed-form solutions. Therefore, in this chapter, we implement the Gaussian-Hermite quadrature rule as in Rizopoulos [5] to approximate the integrals. In our simulation study and R coding, we use

The updating formulas of the parameters in Step 3 have different forms for each parameter following Rizopoulos [2]. We have the closed-form estimates for the measurement error vari-

<sup>ε</sup> in the longitudinal model and the covariance matrix of the random effects as follows:

<sup>ε</sup> is

<sup>W</sup><sup>T</sup>W<sup>p</sup> <sup>b</sup>ijTi; <sup>δ</sup>i; <sup>y</sup><sup>i</sup>

Unfortunately, we cannot obtain closed-form expressions for the fixed effects β and the parameters of the survival submodel γ, α, and θ<sup>h</sup><sup>0</sup> . We thus employ the one-step Newton-Raphson

> ∂S θbð Þ it � �

3 5

�1

<sup>S</sup> <sup>θ</sup>bð Þ it � � ,

∂θ

<sup>∂</sup><sup>Q</sup> <sup>θ</sup>jθð Þ it � � ∂θ

> 2 4

<sup>d</sup>b<sup>i</sup> <sup>¼</sup> <sup>1</sup> N X i v~ bð Þ it <sup>i</sup> <sup>þ</sup> <sup>~</sup> bð Þ it <sup>i</sup> <sup>~</sup> b ð Þ it

; <sup>θ</sup> � �db<sup>i</sup> and <sup>~</sup>vb<sup>i</sup> <sup>¼</sup> var <sup>b</sup>ijTi; <sup>δ</sup>i; <sup>y</sup><sup>i</sup>

; <sup>θ</sup>ð Þ it � �

<sup>i</sup> T, (25)

ðbi�

(27)

; <sup>θ</sup> � � <sup>¼</sup> <sup>Ð</sup>

dbi, (26)

<sup>h</sup><sup>0</sup> . In particular, we have

event which is equivalent to 59.4% of the whole sample.

the Gaussian-Hermite quadrature rule with 10 quadrature points.

<sup>i</sup> bip bijTi; δi; y<sup>i</sup>

bip bijTi; δi; y<sup>i</sup>

ð

; <sup>θ</sup>ð Þ it � �

B. Appendix B

124 Topics in Splines and Applications

<sup>G</sup><sup>b</sup> ð Þ itþ<sup>1</sup>

b<sup>i</sup> ¼ E bijTi; δi; y<sup>i</sup>

where W ¼ y<sup>i</sup> � Xiβ � Xiu<sup>i</sup> � Zivi.

¼ 1 n X i

; <sup>θ</sup> � � <sup>¼</sup> <sup>Ð</sup>

ð bT

; θ � �dbi. The updating formula for σ<sup>2</sup>

<sup>ε</sup>ð Þ itþ<sup>1</sup> <sup>¼</sup> <sup>1</sup> n X i

approach to obtain the updated βð Þ itþ<sup>1</sup> , γð Þ itþ<sup>1</sup> ,αð Þ itþ<sup>1</sup> and θð Þ itþ<sup>1</sup>

Sð Þ¼ θ

<sup>¼</sup> <sup>θ</sup>bð Þ it �

<sup>θ</sup>bð Þ itþ<sup>1</sup>

σb2

ance σ<sup>2</sup>

where ~

biÞp bijTi; δi; y<sup>i</sup>

~

There are four cases for simulating survival time Ti of the model (4.1) as follows.

When the survival time <sup>t</sup> <sup>&</sup>lt; <sup>K</sup>1, we calculate the cumulative hazard function HiðÞ¼ <sup>t</sup> <sup>Ð</sup><sup>t</sup> 0 hið Þs ds. Based on the relation between the survival function Sið Þt , cumulative hazard function Hið Þt and cumulative distribution Fið Þt , we have

$$S\_i(t) = \exp\left(-H\_i(t)\right) = 1 - F\_i(t). \tag{28}$$

Following (5.4), we set

$$
\mu = 1 - F\_i(T\_i) \tag{29}
$$

where u is a random variable with u � Uni½ � 0; 1 . The survival time t is the solution of the equation

$$\mathcal{U} = \exp\left(-H\_i(t)\right) = \exp\left(-\int\_0^t h\_i(s)ds\right).$$

The condition t < K<sup>1</sup> is equal to

$$-\log\left(U\right) < \int\_0^{\mathcal{K}\_1} h(s)ds.$$

When K<sup>1</sup> ≤ t < K2, we calculate the cumulative hazard function HiðÞ¼ t K ð1 0 hið Þs ds þ ðt K<sup>1</sup> hið Þs ds. The survival time t is the solution of the equation

$$\mathcal{U} = \exp\left[-\left\{\int\_0^{\mathcal{K}\_1} h\_i(s)ds + \int\_{\mathcal{K}\_1} h\_i(s)ds\right\}\right],$$

where U is a value of u � Uni½ � 0; 1 . The condition K<sup>1</sup> ≤ t < K<sup>2</sup> is equal to

$$-\log\left(\mathcal{U}\right) < \int\_0^{\mathcal{K}\_1} h\_i(s)ds + \int\_{\mathcal{K}\_1}^{\mathcal{K}\_2} h\_i(s)ds.$$

According to Ruppert et al. [9], the choice for knot position is

th sample quantile of the unique xi for <sup>k</sup> <sup>¼</sup> <sup>1</sup>, …, K.

Parameter True value Censored (20%) Censored (40%)

β<sup>0</sup> 5 4.85 0.30 0.25 5.10 0.30 0.27 β<sup>1</sup> 2 1.86 0.45 0.20 2.10 0.57 0.18 λ<sup>1</sup> 0.1 0.13 0.12 0.00 0.11 0.10 0.00 λ<sup>2</sup> 0.5 0.52 0.07 0.00 0.49 0.14 0.02 γ 0.5 0.48 0.10 0.00 0.51 0.09 0.00 α 0.05 0.05 0.02 0.00 0.04 0.04 0.00 σ 2 2.02 0.05 0.00 2.02 0.06 0.00 D<sup>11</sup> 2 2.21 0.67 0.17 2.27 0.80 0.22 D<sup>22</sup> 2 2.16 0.27 0.09 2.10 0.43 0.05 D<sup>33</sup> 2 2.26 0.27 0.01 2.22 0.60 0.10 D<sup>44</sup> 4 4.20 0.53 0.20 4.24 0.63 0.18

Estimate SD MSE Estimate SD MSE

Penalized Spline Joint Models for Longitudinal and Time-To-Event Data

http://dx.doi.org/10.5772/intechopen.75975

127

<sup>K</sup><sup>k</sup> <sup>¼</sup> <sup>k</sup>þ<sup>1</sup> Kþ2

<sup>K</sup> <sup>¼</sup> min <sup>1</sup>

E. Appendix E

Author details

rates.

Huong Thi Thu Pham<sup>1</sup>

South Australia, Australia

\* and Hoa Pham<sup>2</sup>

2 Mathematical Department, An Giang University, Long Xuyen, Viet Nam

1 School of Computer Science, Engineering and Mathematics, Flinders University, Adelaide,

Table 5. Summary statistics for parameter estimation of the simulated data of the model in (22) for different censoring

\*Address all correspondence to: pham0092@flinders.edu.au

See Table 5.

The simple choice of K is

<sup>4</sup> � number of unique xi; <sup>35</sup> .

When K<sup>2</sup> ≤ t < K3, we calculate the cumulative hazard function HiðÞ¼ t K ð1 0 hið Þs ds þ K ð2 K<sup>1</sup> hið Þs dsþ

Ðt K<sup>2</sup> hið Þs ds. The survival time t is the solution of the equation

$$\mathcal{U} = \exp\left[-\left\{\int\_0^{\mathcal{K}\_1} h\_i(s)ds + \int\_{\mathcal{K}\_1}^{\mathcal{K}\_2} h\_i(s)ds + \int\_{\mathcal{K}\_2}^t h\_i(s)ds\right\}\right].$$

where U is a value of u � Uni½ � 0; 1 . The condition K<sup>2</sup> ≤ t < K<sup>3</sup> is equal to

$$-\log\left(lI\right) < \int\_0^{\mathcal{K}\_1} h\_i(s)ds + \int\_{\mathcal{K}\_1}^{\mathcal{K}\_2} h\_i(s)ds + \int\_{\mathcal{K}\_2}^{\mathcal{K}\_3} h\_i(s)ds.$$

When K<sup>3</sup> ≤ t, the cumulative hazard function has the form HiðÞ¼ t K ð1 0 hið Þs ds þ K ð2 K<sup>1</sup> hið Þs dsþ

K Ð 3 K<sup>2</sup> hið Þ<sup>s</sup> ds <sup>þ</sup> <sup>Ð</sup><sup>t</sup> K<sup>3</sup> hið Þs ds. Survival time t is the solution of the equation

$$\mathcal{U} = \exp\left[-\left\{\int\_0^{\mathcal{K}\_1} h\_i(s)ds + \int\_{\mathcal{K}\_1} h\_i(s)ds + \int\_{\mathcal{K}\_2}^{\mathcal{K}\_3} h\_i(s)ds + \int\_{\mathcal{K}\_3}^t h\_i(s)ds\right\}\right].$$

#### D. Appendix D

In particular, Ruppert et al. [9] introduced a default choices for knot location and number of knots. The idea is to choose sufficient knots to resolve the essential structure in the underlying regression function. But for more complicated penalized spline models, there are computational advantages to keeping the number of knots relatively low. A reasonable default is to choose the knots to ensure that there are a fixed number of unique observations, say 4–5, between each knot. For large data sets, this can lead to an excessive numbers of knots; therefore, a maximum number of allowable knots (say, 20–40 total) are recommended.

According to Ruppert et al. [9], the choice for knot position is

<sup>K</sup><sup>k</sup> <sup>¼</sup> <sup>k</sup>þ<sup>1</sup> Kþ2 th sample quantile of the unique xi for <sup>k</sup> <sup>¼</sup> <sup>1</sup>, …, K.

The simple choice of K is

<sup>K</sup> <sup>¼</sup> min <sup>1</sup> <sup>4</sup> � number of unique xi; <sup>35</sup> .
