4. Empirical results

This section presents two simulation studies for Model 1, whereas Model 2 will be applied for a case study only. In Section 4.1, we simulate data from Model 1 with three internal knots in the longitudinal submodel and Gompertz distribution for the baseline risk function. In Section 4.2, we simulate data from Model 1 having Gompertz distribution for the baseline risk function and non-linear logarithm subject-specific trajectories. The ECM algorithm, written in R code, is applied to estimate the true values of parameters in both cases.

#### 4.1. Simulation study 1

#### 4.1.1. Data description

Recall the penalized spline joint Model 1 of (2.10) with three internal knots in longitudinal submodel and Gompertz distribution for the baseline risk function in the form of

$$h\_i(t) = h\_0(t) \exp\left(\gamma \mathbf{x}\_i + a(m\_i(t))\right) = \lambda\_1 \exp\left(\lambda\_2 t\right) \exp\left\{\gamma \mathbf{x}\_i + am\_i(t)\right\},\tag{19}$$

where h0ð Þt is the hazard function at baseline having Gompertz distribution, xi is baseline covariate and mið Þt denotes the true and unobserved value of the longitudinal at time t. The form of mið Þt is given by

$$m\_i(t) = \beta\_0 + \beta\_1 t + \mu\_{i1}(t - \mathcal{K}\_1)\_+ + \mu\_{i2}(t - \mathcal{K}\_2)\_+ + \mu\_{i3}(t - \mathcal{K}\_3)\_+ + v\_{i0} \tag{20}$$

where b<sup>i</sup> ¼ ð Þ u11; u12; u13; vi<sup>0</sup> <sup>T</sup> is the vector of random effects and is assumed to have a normal distribution with mean zero and the diagonal covariance matrix D ¼ Diag Dð Þ <sup>11</sup>; D22; D33; D<sup>44</sup> . K1, K2, K<sup>3</sup> denote the three internal knots put into the model. The observed longitudinal value at time point t for the i th subject is of the form

$$y\_i(t) = m\_i(t) + \varepsilon\_i(t),\tag{21}$$

where the error variable εið Þt is assumed to come from a normal distribution with mean zero and variance σ2.

From this model, the vector of all parameters θ of the models in (4.1) and (4.2) is <sup>θ</sup> <sup>¼</sup> <sup>θ</sup><sup>T</sup> <sup>t</sup> ; θ<sup>T</sup> <sup>y</sup> ; θ<sup>T</sup> b � �<sup>T</sup> , where θ<sup>t</sup> ¼ ð Þ γ; α; λ1; λ<sup>2</sup> <sup>T</sup> denotes the parameter vector for the survival outcomes. Furthermore, <sup>θ</sup><sup>y</sup> <sup>¼</sup> <sup>β</sup>0; <sup>β</sup>1; <sup>σ</sup><sup>2</sup> ε � �<sup>T</sup> is the parameter vector for longitudinal outcomes and θ<sup>b</sup> ¼ D is the variance matrix of random effects.

To simulate the observed survival time Ti of the joint model in (4.1), we applied the methods adapted by Bender et al. [13], Austin [14] and Crowther and Lambert [15] to generate the true survival time. We further assumed that the censoring mechanism is exponentially distributed with parameter λ. The observed survival time was the minimum of the censoring time and the true survival time. We generated the survival time Ti for n ¼ 500 subjects with the parameters: β<sup>0</sup> ¼ 5, β<sup>1</sup> ¼ 2, λ<sup>1</sup> ¼ 0:1, λ<sup>2</sup> ¼ 0:5, γ ¼ 0:5,α ¼ 0:05, δ ¼ 2 and D ¼ Diagð Þ 2; 2; 2; 4 . Then we generated the longitudinal responses mið Þt using (4.2). The simulated model is therefore

$$\begin{cases} \begin{aligned} h\_i(t) &= 0.1 \exp\left(0.5t\right) \exp\left\{0.5x\_i + 0.05m\_i(t)\right\} \\ m\_i(t) &= 5 + 2t + u\_{i1}(t-1)\_+ + u\_{i2}(t-2)\_+ + u\_{i3}(t-3)\_+ + v\_{i0}. \end{aligned} \tag{22}$$

The sample of simulated data is presented in Appendix A. The curve of Kaplan-Meier estimate for the survival function of simulated data (left panel) and the longitudinal trajectories for the whole simulated sample (right panel) are presented in Figure 1. The dashed lines in the left panel correspond to 95% pointwise confidence intervals. It is clear from the plot of Kaplan-Meier estimator that the survival probability starts from 1 and decreases gradually until at the 5th month of the study. After this, it is nearly zero after 6 months or so. The right panel is the longitudinal trajectories for the first 100 subjects reflecting the form as in (4.2).

#### 4.1.2. Parameter estimation

<sup>b</sup>var <sup>θ</sup><sup>b</sup> � �

> ¼ �X<sup>n</sup> i¼1

This section presents two simulation studies for Model 1, whereas Model 2 will be applied for a case study only. In Section 4.1, we simulate data from Model 1 with three internal knots in the longitudinal submodel and Gompertz distribution for the baseline risk function. In Section 4.2, we simulate data from Model 1 having Gompertz distribution for the baseline risk function and non-linear logarithm subject-specific trajectories. The ECM algorithm, written in R code, is

Recall the penalized spline joint Model 1 of (2.10) with three internal knots in longitudinal

where h0ð Þt is the hazard function at baseline having Gompertz distribution, xi is baseline covariate and mið Þt denotes the true and unobserved value of the longitudinal at time t. The

distribution with mean zero and the diagonal covariance matrix D ¼ Diag Dð Þ <sup>11</sup>; D22; D33; D<sup>44</sup> . K1, K2, K<sup>3</sup> denote the three internal knots put into the model. The observed longitudinal value

where the error variable εið Þt is assumed to come from a normal distribution with mean zero

From this model, the vector of all parameters θ of the models in (4.1) and (4.2) is

th subject is of the form

, where θ<sup>t</sup> ¼ ð Þ γ; α; λ1; λ<sup>2</sup>

yi

hiðÞ¼ t h0ð Þt exp ðγxi þ αð Þ mið Þt Þ ¼ λ<sup>1</sup> exp ð Þ λ2t exp f g γxi þ αmið Þt , (19)

miðÞ¼ <sup>t</sup> <sup>β</sup><sup>0</sup> <sup>þ</sup> <sup>β</sup>1<sup>t</sup> <sup>þ</sup> ui1ð Þ <sup>t</sup> � <sup>K</sup><sup>1</sup> <sup>þ</sup> <sup>þ</sup> ui2ð Þ <sup>t</sup> � <sup>K</sup><sup>2</sup> <sup>þ</sup> <sup>þ</sup> ui3ð Þ <sup>t</sup> � <sup>K</sup><sup>3</sup> <sup>þ</sup> <sup>þ</sup> vi0, (20)

<sup>T</sup> is the vector of random effects and is assumed to have a normal

ðÞ¼ t mið Þþ t εið Þt , (21)

<sup>T</sup> denotes the parameter vector for the survival

submodel and Gompertz distribution for the baseline risk function in the form of

I θb � �

applied to estimate the true values of parameters in both cases.

where

4. Empirical results

114 Topics in Splines and Applications

4.1. Simulation study 1

form of mið Þt is given by

where b<sup>i</sup> ¼ ð Þ u11; u12; u13; vi<sup>0</sup>

at time point t for the i

and variance σ2.

<sup>t</sup> ; θ<sup>T</sup> <sup>y</sup> ; θ<sup>T</sup> b � �<sup>T</sup>

<sup>θ</sup> <sup>¼</sup> <sup>θ</sup><sup>T</sup>

4.1.1. Data description

<sup>¼</sup> <sup>I</sup> <sup>θ</sup><sup>b</sup> n o � � �<sup>1</sup>

∂Sið Þ θ ∂θ

� � � � � <sup>θ</sup>¼θ^ :

,

The ECM algorithm, as described in Section 3.2, is now implemented to estimate all parameters in (4.4). The initial values of the parameters were set at β<sup>0</sup> ¼ 1, β<sup>1</sup> ¼ 1, λ<sup>1</sup> ¼ 0:05, λ<sup>2</sup> ¼ 0:1, γ ¼ 0:1, α ¼ 0:01, σ ¼ 1, D<sup>11</sup> ¼ 3, D<sup>22</sup> ¼ 3, D<sup>33</sup> ¼ 3, D<sup>44</sup> ¼ 3, respectively. However, these initial values can also be set randomly. The traces of each of these parameters are presented in Figures 2 and 3, respectively. The traces of estimates show the way how the algorithm updates

Figure 1. Kaplan-Meier estimate of the survival function of the simulated data of (4.4) (left panel). Longitudinal trajectories of the first 100 subjects from the simulated sample of (4.4) (right panel).

We now run the simulation for 30 independent samples with different sample sizes (n ¼ 200, 300 and 500). Then, we calculate the means, standard deviations (SD) and mean square error (MSE) of parameters as presented in Table 1. The point estimates of each parameter are reasonably close to the true values when the sample sizes are 300 and 500. This is also supported by the values of SD and MSE which decrease gradually when the sample size increases. In addition to this, we also compare the parameter estimates with different censoring rates (20% and 40%) for a sample size of 500 in 5, Appendix E. The result shows that when the sample size is large the

We now perform a simulation study on proportional hazard model having Gompertz distribution at baseline and non-linear subject-specific trajectory. In particular, the model is in the

where h0ð Þt is the hazard function at baseline having Gompertz distribution, xi is baseline covariate and mið Þt denotes the true and unobserved value of the longitudinal at time t. The

β<sup>0</sup> 5 4.21 0.72 0.76 4.68 0.50 0.32 5.10 0.30 0.27 β<sup>1</sup> 2 1.69 0.75 0.57 1.86 0.75 0.28 2.10 0.57 0.18 λ<sup>1</sup> 0.1 0.12 0.13 0.00 0.12 0.12 0.00 0.11 0.10 0.00 λ<sup>2</sup> 0.5 0.50 0.15 0.02 0.57 0.14 0.01 0.48 0.14 0.02 γ 0.5 0.50 0.17 0.03 0.49 0.12 0.04 0.51 0.09 0.01 α 0.05 0.03 0.04 0.00 0.04 0.05 0.00 0.04 0.04 0.00 σ 2 2.06 0.13 0.01 2.02 0.06 0.00 2.02 0.06 0.00 D<sup>11</sup> 2 2.87 0.92 0.62 2.59 0.73 0.53 2.27 0.80 0.22 D<sup>22</sup> 2 2.03 0.42 0.16 2.21 0.46 0.23 2.10 0.43 0.05 D<sup>33</sup> 2 2.10 0.37 0.17 0.34 0.50 0.34 2.22 0.59 0.10 D<sup>44</sup> 4 5.24 1.82 0.76 4.32 0.74 0.60 4.24 0.63 0.18

Table 1. Summary statistics for parameter estimation of the simulated data of the model in (4.4) for different sample sizes.

hiðÞ¼ t h0ð Þt exp ðγxi þ αð Þ mið Þt Þ ¼ λ<sup>1</sup> exp ð Þ λ2t exp f g γxi þ αmið Þt , (23)

th subject has the non-linear form

Penalized Spline Joint Models for Longitudinal and Time-To-Event Data

http://dx.doi.org/10.5772/intechopen.75975

117

<sup>¼</sup> 5 log 1ð Þþ <sup>þ</sup> <sup>t</sup> bi1<sup>t</sup> <sup>þ</sup> bi<sup>0</sup> <sup>þ</sup> <sup>ε</sup>ið Þ<sup>t</sup> , (24)

Estimate SD MSE Estimate SD MSE Estimate SD MSE

censoring rate has little influence on the estimates.

observed longitudinal value at time point t for the i

yi

ðÞ¼ t mið Þþ t εið Þt

Parameter True value n ¼ 200 n ¼ 300 n ¼ 500

4.2. Simulation study 2

4.2.1. Data description

form of

Figure 2. The traces of parameters β0, β1, λ1, λ2, γ, α for 100 iterations.

Figure 3. The traces of parameters σ, D11, D22, D33, D<sup>44</sup> for 100 iterations.

new values of the parameters. In addition, they also demonstrate the convergence of the algorithm after 10–20 iterations. In particular, the parameters β0, β1, λ2, α, σ, D<sup>22</sup> and D<sup>33</sup> converge linearly to the true values while the parameters λ1, γ, D11, and D<sup>44</sup> oscillate before converging to the true values.

We now run the simulation for 30 independent samples with different sample sizes (n ¼ 200, 300 and 500). Then, we calculate the means, standard deviations (SD) and mean square error (MSE) of parameters as presented in Table 1. The point estimates of each parameter are reasonably close to the true values when the sample sizes are 300 and 500. This is also supported by the values of SD and MSE which decrease gradually when the sample size increases. In addition to this, we also compare the parameter estimates with different censoring rates (20% and 40%) for a sample size of 500 in 5, Appendix E. The result shows that when the sample size is large the censoring rate has little influence on the estimates.

#### 4.2. Simulation study 2

#### 4.2.1. Data description

new values of the parameters. In addition, they also demonstrate the convergence of the algorithm after 10–20 iterations. In particular, the parameters β0, β1, λ2, α, σ, D<sup>22</sup> and D<sup>33</sup> converge linearly to the true values while the parameters λ1, γ, D11, and D<sup>44</sup> oscillate before

Figure 2. The traces of parameters β0, β1, λ1, λ2, γ, α for 100 iterations.

116 Topics in Splines and Applications

Figure 3. The traces of parameters σ, D11, D22, D33, D<sup>44</sup> for 100 iterations.

converging to the true values.

We now perform a simulation study on proportional hazard model having Gompertz distribution at baseline and non-linear subject-specific trajectory. In particular, the model is in the form of

$$h\_i(t) = h\_0(t) \exp\left(\gamma \mathbf{x}\_i + a(m\_i(t))\right) = \lambda\_1 \exp\left(\lambda\_2 t\right) \exp\left\{\gamma \mathbf{x}\_i + am\_i(t)\right\},\tag{23}$$

where h0ð Þt is the hazard function at baseline having Gompertz distribution, xi is baseline covariate and mið Þt denotes the true and unobserved value of the longitudinal at time t. The observed longitudinal value at time point t for the i th subject has the non-linear form



Table 1. Summary statistics for parameter estimation of the simulated data of the model in (4.4) for different sample sizes.

where <sup>ε</sup>iðÞ� <sup>t</sup> <sup>N</sup> <sup>0</sup>; <sup>σ</sup><sup>2</sup> . In the model of (4.6), the mean longitudinal response of the population is assumed to have a non-linear logarithm curve. Different subjects are assumed to have different intercepts and slopes. In particular, we assume that bi ¼ ð Þ bi0; bi<sup>1</sup> <sup>T</sup> having a bivariate normal distribution with mean μ ¼ ð Þ 3; 2 and covariance matrix D ¼ Diagð Þ 1; 1 . The true values of the other parameters we put in the model were λ<sup>1</sup> ¼ 0:01, λ<sup>2</sup> ¼ 0:1, γ ¼ 0:5, α ¼ 0:2, σ ¼ 2, respectively. In addition, the censoring mechanism is assumed exponentially distributed with a parameter of λ ¼ 0:25.

Based on the model in (4.5) and the simulation study 1, we simulated survival times Ti for 500 subjects with 35% censoring rate. In particular, the ending time for the study was 5 months and all subjects alive by the end of the study (i.e. time 5) were assumed to be censored. This design was also reflected of many clinical studies in real life. In this sample, there were 329 uncensored subjects and 1387 observations for 500 subjects. For each subject, 1–5 longitudinal measurements were recorded. On average, there were three longitudinal measurements per subject. In Figure 4, the Kaplan-Meier estimate for survival curve is presented for the simulated data of (4.5) with 95% pointwise confidence intervals in the left panel. Moreover, the subject-specific longitudinal profiles for six randomly selected subjects is drawn in the right panel. It can be seen that some of the subjects in this dataset showed non-linear evolutions in their longitudinal values. Each subject has its own trajectory.

The results for parameter estimation are presented in Table 2. The means, standard deviations and 95% confidence intervals of parameter estimates are calculated for 30 independent samples. The point estimates for λ1, λ2, γ, α, σ<sup>2</sup> are reasonably close to the true values. Simi-

Table 2. Summary statistics for parameter estimation of the simulated data of the model in (4.5) and (4.6).

Parameter True value Estimate SD 95% CI β<sup>0</sup> — 3.399 0.673 [3.158;3.640] β<sup>1</sup> — 4.330 0.142 [4.280;4.380] λ<sup>1</sup> 0.01 0.013 0.021 [0.007;0.021] λ<sup>2</sup> 0.1 0.083 0.184 [0.017;0.148] γ 0.5 0.640 0.386 [0.486;0.778] α 0.2 0.186 0.142 [0.136;0.237] σ 2 1.993 0.061 [1.971;2.015] D<sup>11</sup> — 0.977 0.190 [0.909;1.044] D<sup>22</sup> — 1.365 0.183 [1.300;1.430] D<sup>33</sup> — 1.976 0.154 [1.921;2.031] D<sup>44</sup> — 1.393 0.196 [1.322;1.463]

Penalized Spline Joint Models for Longitudinal and Time-To-Event Data

http://dx.doi.org/10.5772/intechopen.75975

119

Based on the estimated values of parameters, we generate back the estimated survival time by approximating values of random effects from linear mixed-effects function. The detail of the generation is explained in Appendix C. Then, we use the Kaplan-Meier estimate to compare between the survival function of the simulated dataset (the black solid line) and the estimated survival function (the dashed line) which are presented in the left panel of

Moreover, we also draw the smooth and predicted longitudinal profiles for 12 patients chosen randomly in the right panel of Figure 5. The dot points are the true observed longitudinal values from simulated data. The solid lines are the smooth longitudinal profiles of the true observed longitudinal values using the loess smoother and the dashed lines are the predicted profiles of 12 randomly selected individuals. It is clear that the Kaplan-Meier estimates from simulated data overlaps the Kaplan-Meier estimates based on the predicted value in the left panel of Figure 4. The penalized spline regression model in (2.10) was a good fit for subject-

In summary, simulation studies have shown the stability of the algorithm and the goodness of fit of the penalized spline models. From the simulation study 1, it is shown that the updating process through the ECM algorithm converges quickly to the true values of the parameters. In addition, the simulation study 2 shows that the model can well predict the survival function

larly, the 95% CIs include the true values of λ1, λ2, γ, α, σ2.

specific curves in the right panel of Figure 5.

and individual trajectories respectively.

Figure 5.

#### 4.2.2. Parameter estimation

We will be using Model 1 in (4.1) and in (4.2) to handle the non-linear longitudinal trajectory in the simulated data in (4.5). In this model, we put three internal knots at 25, 50 and 75%, respectively, of the follow up time. Then, the ECM algorithm, as explained in Section 3, is implemented once again to estimate all parameters in the model.

Figure 4. Kaplan-Meier estimate of the survival function of the simulated data of (4.5) (left panel). Longitudinal trajectories for the six randomly selected subjects of (4.6) (right panel).

Penalized Spline Joint Models for Longitudinal and Time-To-Event Data http://dx.doi.org/10.5772/intechopen.75975 119


where <sup>ε</sup>iðÞ� <sup>t</sup> <sup>N</sup> <sup>0</sup>; <sup>σ</sup><sup>2</sup> . In the model of (4.6), the mean longitudinal response of the population is assumed to have a non-linear logarithm curve. Different subjects are assumed to have

normal distribution with mean μ ¼ ð Þ 3; 2 and covariance matrix D ¼ Diagð Þ 1; 1 . The true values of the other parameters we put in the model were λ<sup>1</sup> ¼ 0:01, λ<sup>2</sup> ¼ 0:1, γ ¼ 0:5, α ¼ 0:2, σ ¼ 2, respectively. In addition, the censoring mechanism is assumed exponentially

Based on the model in (4.5) and the simulation study 1, we simulated survival times Ti for 500 subjects with 35% censoring rate. In particular, the ending time for the study was 5 months and all subjects alive by the end of the study (i.e. time 5) were assumed to be censored. This design was also reflected of many clinical studies in real life. In this sample, there were 329 uncensored subjects and 1387 observations for 500 subjects. For each subject, 1–5 longitudinal measurements were recorded. On average, there were three longitudinal measurements per subject. In Figure 4, the Kaplan-Meier estimate for survival curve is presented for the simulated data of (4.5) with 95% pointwise confidence intervals in the left panel. Moreover, the subject-specific longitudinal profiles for six randomly selected subjects is drawn in the right panel. It can be seen that some of the subjects in this dataset showed non-linear evolutions in their longitudinal values. Each

We will be using Model 1 in (4.1) and in (4.2) to handle the non-linear longitudinal trajectory in the simulated data in (4.5). In this model, we put three internal knots at 25, 50 and 75%, respectively, of the follow up time. Then, the ECM algorithm, as explained in Section 3, is

Figure 4. Kaplan-Meier estimate of the survival function of the simulated data of (4.5) (left panel). Longitudinal trajecto-

implemented once again to estimate all parameters in the model.

ries for the six randomly selected subjects of (4.6) (right panel).

<sup>T</sup> having a bivariate

different intercepts and slopes. In particular, we assume that bi ¼ ð Þ bi0; bi<sup>1</sup>

distributed with a parameter of λ ¼ 0:25.

118 Topics in Splines and Applications

subject has its own trajectory.

4.2.2. Parameter estimation

Table 2. Summary statistics for parameter estimation of the simulated data of the model in (4.5) and (4.6).

The results for parameter estimation are presented in Table 2. The means, standard deviations and 95% confidence intervals of parameter estimates are calculated for 30 independent samples. The point estimates for λ1, λ2, γ, α, σ<sup>2</sup> are reasonably close to the true values. Similarly, the 95% CIs include the true values of λ1, λ2, γ, α, σ2.

Based on the estimated values of parameters, we generate back the estimated survival time by approximating values of random effects from linear mixed-effects function. The detail of the generation is explained in Appendix C. Then, we use the Kaplan-Meier estimate to compare between the survival function of the simulated dataset (the black solid line) and the estimated survival function (the dashed line) which are presented in the left panel of Figure 5.

Moreover, we also draw the smooth and predicted longitudinal profiles for 12 patients chosen randomly in the right panel of Figure 5. The dot points are the true observed longitudinal values from simulated data. The solid lines are the smooth longitudinal profiles of the true observed longitudinal values using the loess smoother and the dashed lines are the predicted profiles of 12 randomly selected individuals. It is clear that the Kaplan-Meier estimates from simulated data overlaps the Kaplan-Meier estimates based on the predicted value in the left panel of Figure 4. The penalized spline regression model in (2.10) was a good fit for subjectspecific curves in the right panel of Figure 5.

In summary, simulation studies have shown the stability of the algorithm and the goodness of fit of the penalized spline models. From the simulation study 1, it is shown that the updating process through the ECM algorithm converges quickly to the true values of the parameters. In addition, the simulation study 2 shows that the model can well predict the survival function and individual trajectories respectively.

Figure 5. Kaplan-Meier estimate of the survival function from simulated failure times (the solid line) with 95% confidence intervals (dot lines), from Model 1 (4.5) (the dashed line) (left panel). Observed longitudinal trajectories (the solid line) and predicted longitudinal trajectories (the dashed line) for the twelve randomly selected subjects (right panel).

#### 4.3. The AIDS data set

In the AIDS dataset, there were 467 patients with advanced human immunodeficiency virus infection during antiretroviral treatment who had failed or were intolerant to zidovudine therapy. Patients in the study were randomly assigned to receive either didanosine drug (ddI) or zalcitabine drug (ddC). CD4 cells are a type of white blood cells made in the spleen, lymph nodes and thymus gland and are part of the infection-fighting system. CD4 cell counts were recorded at the time of study entry as well as at 2, 6, 12 and 18 months thereafter. The detail regarding the design of this study can be found in Abrams et al. [16]. By the end of the study, there were 188 patients died, resulting in about 59.7% censoring. There were 1405 longitudinal responses recorded.

Previously, Rizopoulos [2] used his standard joint model for the AIDS data which consider the variability between subjects mostly depend on the intercept. However, the model could not predict observed longitudinal data accurately. When the time unit is changed from month to year in the data, the variability between subjects depends not only on the intercept but also on the obstime variable. In addition, the longitudinal trajectories plot also shows many non-linear curves as depicted in the right panel of Figure 6.

Given the non-linearity, it is appropriate to apply our models, Model 1 and Model 2, for the AIDS data. In particular, we use the two joint models in (2.10) and (2.11) with the four internal knots are placed at 20, 40, 60, 80%, respectively of the observed failure times for hazard rate at baseline. Then, the ECM algorithm is implemented to estimate all parameters in the two models. A summary of statistics for parameter estimation using Model 1 and Model 2 is presented in Table 3.

Following Rizopoulos [2], in Model 1 and Model 2, the univariate Wald tests are applied for the fixed effects β ¼ β0; β<sup>1</sup> <sup>T</sup> in the longitudinal submodel, the regression coefficient γ and the association parameter α respectively. The results from Table 3 show that the point estimates of β0, β1, γ, α are all statistically significant for both models at a significance level of 5%.

We conduct the Kaplan-Meier estimate of the survival function from the observed survival time (the light solid line) and the dot lines correspond to 95% pointwise confidence intervals in Figure 6 (left panel). The predicted survival function from Model 1 is the dashed line and the predicted survival function from Model 2 is the bold solid line. The plots show that Model 2 works very well in this case as shown in Figure 7. Moreover, Model 2 is also preferred in

Table 3. Summary statistics for parameter estimation of the AIDS data of Model 1 and Model 2 respectively.

D<sup>22</sup> 0.97 0.06 D<sup>33</sup> 0.99 0.07 D<sup>44</sup> 11.40 0.75

Penalized Spline Joint Models for Longitudinal and Time-To-Event Data

http://dx.doi.org/10.5772/intechopen.75975

121

Figure 6. Kaplan-Meier estimate of the survival function of the AIDS data (left panel). Longitudinal trajectories for CD4

Parameter Estimate Std. error z-value p-value Parameter Estimate Std. error z-value p-value β<sup>0</sup> 7.87 0.06 127.07 <0.001 β<sup>0</sup> 7.81 0.07 114.34 <0.001 β<sup>1</sup> 1.69 0.11 14.77 <0.001 β<sup>1</sup> 1.62 0.12 12.99 <0.001 γ 0.22 0.11 2.06 0.039 γ 0.31 0.10 3.03 0.002 α 0.20 0.01 15.84 <0.001 α 0.24 0.01 18.15 <0.001

cell count of the first 100 patients for two groups (right panel).

Model 1 Model 2

λ<sup>1</sup> 1.68 0.07 λ<sup>1</sup> 1.04 0.11 λ<sup>2</sup> 0.33 0.00 λ<sup>2</sup> 1.79 0.23 σ 2.36 0.36 λ<sup>3</sup> 1.38 0.38 D<sup>11</sup> 2.18 0.14 λ<sup>4</sup> 1.67 0.42 D<sup>22</sup> 1.04 0.07 λ<sup>5</sup> 2.48 0.66 D<sup>33</sup> 0.85 0.06 σ 2.62 0.45 D<sup>44</sup> 11.87 0.78 D<sup>11</sup> 1.02 0.07 Penalized Spline Joint Models for Longitudinal and Time-To-Event Data http://dx.doi.org/10.5772/intechopen.75975 121

Figure 6. Kaplan-Meier estimate of the survival function of the AIDS data (left panel). Longitudinal trajectories for CD4 cell count of the first 100 patients for two groups (right panel).

4.3. The AIDS data set

120 Topics in Splines and Applications

In the AIDS dataset, there were 467 patients with advanced human immunodeficiency virus infection during antiretroviral treatment who had failed or were intolerant to zidovudine therapy. Patients in the study were randomly assigned to receive either didanosine drug (ddI) or zalcitabine drug (ddC). CD4 cells are a type of white blood cells made in the spleen, lymph nodes and thymus gland and are part of the infection-fighting system. CD4 cell counts were recorded at the time of study entry as well as at 2, 6, 12 and 18 months thereafter. The detail regarding the design of this study can be found in Abrams et al. [16]. By the end of the study, there were 188 patients died,

Figure 5. Kaplan-Meier estimate of the survival function from simulated failure times (the solid line) with 95% confidence intervals (dot lines), from Model 1 (4.5) (the dashed line) (left panel). Observed longitudinal trajectories (the solid line) and

predicted longitudinal trajectories (the dashed line) for the twelve randomly selected subjects (right panel).

Previously, Rizopoulos [2] used his standard joint model for the AIDS data which consider the variability between subjects mostly depend on the intercept. However, the model could not predict observed longitudinal data accurately. When the time unit is changed from month to year in the data, the variability between subjects depends not only on the intercept but also on the obstime variable. In addition, the longitudinal trajectories plot also shows many non-linear

Given the non-linearity, it is appropriate to apply our models, Model 1 and Model 2, for the AIDS data. In particular, we use the two joint models in (2.10) and (2.11) with the four internal knots are placed at 20, 40, 60, 80%, respectively of the observed failure times for hazard rate at baseline. Then, the ECM algorithm is implemented to estimate all parameters in the two models. A summary of statistics for parameter estimation using Model 1 and Model 2 is presented in Table 3.

Following Rizopoulos [2], in Model 1 and Model 2, the univariate Wald tests are applied for

association parameter α respectively. The results from Table 3 show that the point estimates of β0, β1, γ, α are all statistically significant for both models at a significance level of 5%.

<sup>T</sup> in the longitudinal submodel, the regression coefficient γ and the

resulting in about 59.7% censoring. There were 1405 longitudinal responses recorded.

curves as depicted in the right panel of Figure 6.

the fixed effects β ¼ β0; β<sup>1</sup>


Table 3. Summary statistics for parameter estimation of the AIDS data of Model 1 and Model 2 respectively.

We conduct the Kaplan-Meier estimate of the survival function from the observed survival time (the light solid line) and the dot lines correspond to 95% pointwise confidence intervals in Figure 6 (left panel). The predicted survival function from Model 1 is the dashed line and the predicted survival function from Model 2 is the bold solid line. The plots show that Model 2 works very well in this case as shown in Figure 7. Moreover, Model 2 is also preferred in

covariates and the risk for an event and (3) the two models provide a good prediction for both

Penalized Spline Joint Models for Longitudinal and Time-To-Event Data

http://dx.doi.org/10.5772/intechopen.75975

123

The limitations of this study are, at least, threefold: (1) the number of internal knots is limited to three due to computational costs; (2) the polynomial power functions can form an illconditioned basis for the models and (3) the estimation results are sensitive when both random

Based on the limitations, our future work will focus on using new methods for approximating the integrals to reduce the computational problems or relaxing the normality assumption. Furthermore, we will apply a different basis for joint models, that is the penalized B-spline. In terms of parameter estimation, we are considering a different approach to estimate the parameters in the models using a Bayesian approach, via Markov chain Monte Carlo (MCMC)

One sample of simulated data of the penalized spline joint model in (4.4) is presented in Table 4 for the first three patients. The subjects are measured bimonthly and the entry time is 0 for all

Id Obstime Time x y Death Z<sup>1</sup> Z<sup>2</sup> Z<sup>3</sup> Z<sup>4</sup> 1 0.0 4.97 0 1.41 1 0.0 0.0 0.0 1 1 0.5 4.97 0 6.45 1 0.0 0.0 0.0 1 1 1.0 4.97 0 4.10 1 0.0 0.0 0.0 1 1 1.5 4.97 0 1.50 1 0.5 0.0 0.0 1 1 2.0 4.97 0 4.07 1 1.0 0.0 0.0 1 1 2.5 4.97 0 6.16 1 1.5 0.5 0.0 1 1 3.0 4.97 0 3.60 1 2.0 1.0 0.0 1 1 3.5 4.97 0 8.32 1 2.5 1.5 0.5 1 1 4.0 4.97 0 6.32 1 3.0 2.0 1.0 1 2 0.0 2.79 0 6.81 1 0.0 0.0 0.0 1 2 0.5 2.79 0 7.77 1 0.0 0.0 0.0 1 2 1.0 2.79 0 9.75 1 0.0 0.0 0.0 1 2 1.5 2.79 0 11.04 1 0.5 0.0 0.0 1 2 2.0 2.79 0 7.20 1 1.0 0.0 0.0 1 3 0.0 1.90 0 1.84 0 0.0 0.0 0.0 1 3 0.5 1.90 0 1.12 0 0.0 0.0 0.0 1 3 1.0 1.90 0 0.78 0 0.0 0.0 0.0 1

Table 4. A snapshot of simulated data for penalized spline joint model in (4.4).

the longitudinal and survival functions, as presented in empirical results.

effects and error are not normally distributed.

algorithms.

A. Appendix A

Figure 7. Kaplan-Meier estimates of the survival function from observed failure times, from model 1 and from model 2 (left panel). Observed longitudinal trajectories (the solid line) and predicted longitudinal trajectories (the dashed line) for the 12 randomly selected patients (right panel).

practice because h0ð Þ: usually is considered as unspecified in order to avoid the impact of misspecifying the distribution of survival times.

Based on the model of longitudinal regression in (4.2), we also draw the smooth and predicted longitudinal profiles for nine patients from the AIDS dataset as depicted in Figure 7 (right panel). The dot points are the true observed longitudinal values. The solid lines are the smooth longitudinal profiles using the loess smoother and the dashed lines are the predicted profiles of nine randomly selected individuals. Most of the predicted profiles are quite close to the observed ones.
