2. The penalized spline joint models

In modern survival analysis, Cox [1] has been considered as a very popular joint model to be used for time-independent covariates. These models measured the effect of time-independent covariates on the hazard rate for an event. Subsequently, the extended Cox model was developed for external time-dependent covariates. However, these latter models cannot handle longitudinal biomarkers. Therefore, Rizopoulos [2] introduced joint models for internal timedependent covariates and the risk for an event based on linear mixed-effects models and

The basic assumption for the standard joint models proposed by Rizopoulos [2] is that the hazard rate at a given time of the dropout process is associated with the expected value of the longitudinal responses at the same time. The whole history of response has an influence on the survival function. Thus, it is crucial to obtain good estimates for the subject-specific trajectories in order to have an accurate estimation of the survival function. In addition, an important feature that we need to account for is that many observations in the sample often show non-linear and fluctuated longitudinal trajectories in time. Each observation has its own trajectory. Therefore, flexibility is needed for subject-specific longitudinal submodels in the

There are several previous works to flexibly model the subject-specific longitudinal profiles in the joint models. Brown et al. [3] applied B-splines with multidimensional random effects. In particular, Brown et al. [3] assumed that both subject and population trajectories have the same number of basis functions. By doing this, the number of parameters in the longitudinal submodel is reasonably large. If we have to deal with the roughness of the fit for this model, the computational problems will increase especially when the dimension of the random effects vector is large. Ding and Wang [4] proposed the use of B-splines with a single multiplicative random effect to link the population mean function with the subject-specific profile. This simple model can gain an easy estimation for parameters, however may not be appropriate for many practical applications [5]. Rizopoulos [5] considered more flexible models using natural cubic splines with the expansion of the random effects vector. The roughness of the fit

In this chapter, we present new approaches to model non-linear shapes of subjects-specific evolutions for joint models by extending the standard joint models of Rizopoulos [2]. In particular, we implement penalized splines using a truncated polynomial basis for the longitudinal submodel. Following this, the linear mixed-effects approach is applied to model the individual trajectories and impose smoothness over adjacent coefficients respectively. The ECM algorithm is used for parameter estimation. In addition, corresponding standard errors are calculated using the observed information matrix. However, as the matrices of random effects covariates in our models are different from the matrices of random effects covariates in the standard joint models, the JM package of Rizopoulos [6] cannot be used for our models. Therefore, a set of R codes are written for the penalized spline joint models to implement the

The chapter is organized as follows. Section 2 describes the penalized splines with truncated polynomial basis for the joint models. In this section, the two models are specified as penalized spline joint model with hazard rate at base line having Gompertz distribution (referred to as

proposed procedures on the simulated data and a case study respectively.

relative risk models.

106 Topics in Splines and Applications

joint models to improve the predictions.

is still not mentioned in these models.

In this section, we introduce the joint models using penalized spline with truncated polynomial basis. The proposed parametrization is based on the standard joint models of Rizopoulos [2] and the regression model of a longitudinal response using penalized spline.

Notations in this section are taken from Rizopoulos [2]. Let T<sup>∗</sup> <sup>i</sup> be the true survival time and Ci be the censoring time for the i th subject ð Þ <sup>i</sup> <sup>¼</sup> <sup>1</sup>; …; <sup>n</sup> . Ti denotes the observed failure time for the i th subject ð Þ <sup>i</sup> <sup>¼</sup> <sup>1</sup>;…; <sup>n</sup> , which is defined as Ti <sup>¼</sup> min T<sup>∗</sup> <sup>i</sup> ;Ci . If an i th subject is not censored, this means that we have observed its survival time, we will have Ti ≤Ci. If an i th subject is censored, this means that we lose its follow up, or the subject has died from other causes, we will have Ti <sup>&</sup>gt; Ci. Furthermore, we define the event indicator as <sup>δ</sup><sup>i</sup> <sup>¼</sup> I T<sup>∗</sup> <sup>i</sup> ≤ Ci . The observed data for survival outcome are Ti ð Þ ; δ<sup>i</sup> , i ¼ 1, …, n.

For a longitudinal response, suppose that we have n subjects in the sample and the actual observed longitudinal data for each subject-i at time point t is yi ð Þt . We measure the i th subject at ni time points. Thus, the longitudinal data consists of the measurements yij <sup>¼</sup> yi tij ; <sup>j</sup> <sup>¼</sup> 1;…; nig taken at time points tij: We denote the true and unobserved value of the longitudinal outcome at time t as mið Þt . We assume the relation between yi ð Þt and mið Þt as

$$y\_i(t) = m\_i(t) + \varepsilon\_i(t),\tag{1}$$

where <sup>ε</sup>iðÞ� <sup>t</sup> <sup>N</sup> <sup>0</sup>; <sup>σ</sup><sup>2</sup> ε .

When survival function S tð Þ is assumed to have a specific parametric form associating with a longitudinal submodel, estimations for parameters of interest are usually based on the likelihood function [2]. In the maximum likelihood method, there are different treatments for different types of covariates in the longitudinal submodel. Here, we present the two different categories of time-dependent covariates and the estimation techniques for these covariates will be introduced in the following sections. We let the time-dependent covariate for the i th subject at time t be denoted by yi ð Þt . We let YiðÞ¼ t yi ð Þ<sup>s</sup> ; <sup>0</sup> <sup>≤</sup> <sup>s</sup> <sup>&</sup>lt; <sup>t</sup> denote the covariate history of the i th subject up to time t. According to Kalbfleisch and Prentice [7], the exogenous covariates are the covariates satisfying the condition:

$$\Pr(\mathbf{s} \le T\_i < \mathbf{s} + d\mathbf{s} | T\_i \ge \mathbf{s}, \mathcal{Y}\_i(\mathbf{s})) = \Pr(\mathbf{s} \le T\_i < \mathbf{s} + d\mathbf{s} | T\_i \ge \mathbf{s}, \mathcal{Y}\_i(\mathbf{t})),\tag{2}$$

for all s, t such that 0 < s ≤ t and ds ! 0. An equivalent definition is

$$\Pr(\mathcal{Y}\_i(t)|\mathcal{Y}\_i(s), T\_i \ge s) = \Pr(\mathcal{Y}\_i(t)|\mathcal{Y}\_i(s), T\_i = s), s \le t. \tag{3}$$

function in (3.2) will also decrease. This presentation is crucial for reducing the computational

ij þ … þ βpt

2

X<sup>1</sup> 0 … 0 0 X<sup>2</sup> … 0 ⋮ ⋮ ⋱⋮ 0 0 … X<sup>n</sup>

p ij <sup>þ</sup><sup>X</sup> K

ij þ … þ vipt

k¼1

p ij <sup>þ</sup> <sup>ε</sup><sup>i</sup> tij � �:

uipk tij � K<sup>k</sup> � �<sup>p</sup>

Penalized Spline Joint Models for Longitudinal and Time-To-Event Data

y ¼ Xβ þ Zb þ ε, (7)

Z<sup>1</sup> 0 … 0 0 Z<sup>2</sup> … 0 ⋮ ⋮ ⋱⋮

0 0 … Z<sup>n</sup>

<sup>þ</sup> <sup>⋯</sup> ð Þ ti<sup>1</sup> � <sup>K</sup><sup>K</sup> <sup>p</sup>

<sup>þ</sup> <sup>⋯</sup> tini � <sup>K</sup><sup>K</sup>

� � matrix of random effect

<sup>i</sup> ≥ t;Mið Þt ; w<sup>i</sup>

ð Þ ti<sup>1</sup> � <sup>K</sup><sup>1</sup> <sup>p</sup>

2 6 4

<sup>b</sup><sup>T</sup> <sup>¼</sup> <sup>v</sup>10; …; <sup>v</sup>1p; …; vn0; …; vnp; <sup>u</sup>1p1;…; <sup>u</sup>1pK;…; unp1; …; unpK � �,

� � matrix of observed longitudinal data; <sup>X</sup> is the <sup>P</sup><sup>n</sup>

i¼1

Postulating a proportion hazard model, the penalized spline joint models for longitudinal and

<sup>¼</sup> <sup>h</sup>0ð Þ<sup>t</sup> exp <sup>γ</sup><sup>T</sup>w<sup>i</sup> <sup>þ</sup> <sup>α</sup>mið Þ<sup>t</sup> � �,

where h0ð Þt is the hazard at baseline and w<sup>i</sup> is a vector of baseline covariates (such as treatment indicator, gender of a patient, etc.). Furthermore, MiðÞ¼ t f g mið Þs ; 0 ≤ s < t denotes the his-

<sup>i</sup> <sup>&</sup>lt; <sup>t</sup> <sup>þ</sup> dtjT<sup>∗</sup>

Pr t ≤ T<sup>∗</sup>

tini � K<sup>1</sup> � �<sup>p</sup>

ni � ð Þ p þ K þ 1 n

� �=dt

⋮ ⋮⋮

þ

http://dx.doi.org/10.5772/intechopen.75975

,

� �<sup>p</sup>

þ

3 7 5,

þ

i¼1

ni � ð Þ p þ 1 � �

(8)

(6)

109

2

problems while coding. The model in (2.5) now becomes:

The model in (2.6) can be rewritten in matrix notation as:

X<sup>1</sup> ⋮ X<sup>n</sup> 3 7 <sup>5</sup>, <sup>Z</sup> <sup>¼</sup>

2 <sup>i</sup><sup>1</sup> ⋯ t

⋮⋮ ⋮ ⋮ ⋮

2 ini ⋯ t

2 6 4

1 ti<sup>1</sup> t

1 tini t

<sup>β</sup><sup>T</sup> <sup>¼</sup> <sup>β</sup>0; …; <sup>β</sup><sup>p</sup> � �:

ni � 1

matrix of fixed effect covariates; <sup>Z</sup> is the <sup>P</sup><sup>n</sup>

i¼1

hið Þ¼ tjMið Þt ; w<sup>i</sup> lim

ni � 1

� � matrix of error.

dt!0

tory of the true unobserved longitudinal process up to time point t.

X ¼

X<sup>i</sup> ¼

Here, <sup>y</sup> is the <sup>P</sup><sup>n</sup>

2 6 4

i¼1

covariates and <sup>ε</sup> is the <sup>P</sup><sup>n</sup>

time-to-event data is defined by

where

yi tij � � <sup>¼</sup> f tij � � <sup>þ</sup> gi tij � � <sup>þ</sup> <sup>ε</sup><sup>i</sup> tij � �

¼ β<sup>0</sup> þ β1tij þ β2t

þ vi<sup>0</sup> þ vi1tij þ vi2t

> p i1

3 7 <sup>5</sup>, <sup>Z</sup><sup>i</sup> <sup>¼</sup>

p ini

On the other hand, endogenous time-varying covariates are the ones that do not satisfy the condition in (2.2). In particular,

$$\Pr(\mathcal{Y}\_i(t)|\mathcal{Y}\_i(s), T\_i \ge s) \ne \Pr(\mathcal{Y}\_i(t)|\mathcal{Y}\_i(s), T\_i = s), s \le t.$$

In the penalized spline regression models [8, 9], the observed longitudinal covariate is modeled using the truncated power functions with a general power basis of degree p. Moreover, the longitudinal response is also parameterized as a linear mixed-effects model to specify the individual curves and impose the amount of smoothing. As a result, the coefficients of the knots will be constrained to handle smoothing. In particular, the longitudinal submodel for the i th subject at time point tij is

$$\begin{aligned} y\_{ij} &= f\left(t\_{\vec{\imath}j}\right) + g\_i\left(t\_{\vec{\imath}j}\right) + \varepsilon\left(t\_{\vec{\imath}j}\right), \varepsilon\left(t\_{\vec{\imath}j}\right) \sim N\left(0, \sigma\_\varepsilon^2\right), \\\\ f\left(t\_{\vec{\imath}j}\right) &= \beta\_0 + \beta\_1 t\_{\vec{\imath}j} + \dots + \beta\_p t\_{\vec{\imath}j}^p + \sum\_{k=1}^K \mu\_{pk} \left(t\_{\vec{\imath}j} - \mathcal{K}\_k\right)\_{+\prime}^p \\\\ g\_i\left(t\_{\vec{\imath}j}\right) &= \upsilon\_{i0} + \upsilon\_{i1} t\_{\vec{\imath}j} + \upsilon\_{i2} t\_{\vec{\imath}j}^2 + \dots + \upsilon\_{ip} t\_{\vec{\imath}j}^p + \sum\_{k=1}^K \varpi\_{ipk} \left(t\_{\vec{\imath}j} - \mathcal{K}\_k\right)\_+^p. \end{aligned} \tag{4}$$

Here, the set 1; tij; …; t p ij; tij � K<sup>1</sup> � �<sup>p</sup> <sup>þ</sup>;…; tij � <sup>K</sup><sup>K</sup> � �<sup>p</sup> þ n o is known as the truncated power basis of degree p. Moreover, K1, …, K<sup>K</sup> are fitted K knots, for which K is chosen following Ruppert et al. [9], Chapter 5), Appendix D. The function fð Þ: is the smooth function which reflects the overall trend of the population. The function gi ð Þ: is the smooth function which reflects the individual curves. To constrain the coefficient of knots, the vector up1;…; upK � �<sup>T</sup> in the function <sup>f</sup>ð Þ: is treated as random effects. Therefore, <sup>β</sup><sup>T</sup> <sup>¼</sup> <sup>β</sup>0; …; <sup>β</sup><sup>p</sup> � � is a ð Þ ð Þ� <sup>p</sup> <sup>þ</sup> <sup>1</sup> <sup>1</sup> row vector of fixed effects and b<sup>T</sup> <sup>i</sup> <sup>¼</sup> up1;…; upK; vi0;…; vip; wip1;…; wipK � � is a ð Þ ð Þ� <sup>p</sup> <sup>þ</sup> <sup>2</sup><sup>K</sup> <sup>þ</sup> <sup>1</sup> <sup>1</sup> vector of random effects for the i th subject. The assumptions for the random effects for the i th subject are vi0;…; vip � �<sup>T</sup> � <sup>N</sup> <sup>0</sup>; <sup>P</sup> ð Þ, upk � <sup>N</sup> <sup>0</sup>; <sup>σ</sup><sup>2</sup> u � �, wipk � <sup>N</sup> <sup>0</sup>; <sup>σ</sup><sup>2</sup> w � � and they are independent of one another. We can now rewrite (2.4) as

$$\begin{split} y\_i(t\_{\vec{\eta}}) &= f(t\_{\vec{\eta}}) + g\_i(t\_{\vec{\eta}}) + \varepsilon\_i(t\_{\vec{\eta}}) \\ &= \beta\_0 + \beta\_1 t\_{\vec{\eta}} + \beta\_2 t\_{\vec{\eta}}^2 + \dots + \beta\_p t\_{\vec{\eta}}^p + \sum\_{k=1}^K (u\_{pk} + w\_{ipk}) \left( t\_{\vec{\eta}} - \mathcal{K}\_k \right)\_+^p \\ &+ \upsilon\_{\vec{\eta}0} + \upsilon\_{\vec{\eta}1} t\_{\vec{\eta}} + \upsilon\_{\vec{\omega}2} t\_{\vec{\eta}}^2 + \dots + \upsilon\_{\vec{\eta}r} t\_{\vec{\eta}}^p + \varepsilon\_i(t\_{\vec{\eta}}) .\end{split} \tag{5}$$

We let uipk <sup>¼</sup> upk <sup>þ</sup> wipk and note that uipk � <sup>N</sup> <sup>0</sup>; <sup>σ</sup><sup>2</sup> <sup>u</sup> <sup>þ</sup> <sup>σ</sup><sup>2</sup> w � �. In order to allow greater flexibility, we assume that uip1;…; uipK � �<sup>T</sup> � <sup>N</sup>ð Þ <sup>0</sup>; <sup>D</sup> , where <sup>D</sup> <sup>¼</sup> Diag Dð Þ <sup>11</sup>;…; DKK . By doing this, the dimension of the vector of random effects, b<sup>T</sup> <sup>i</sup> <sup>¼</sup> vi0;…; vip; uip1;…; uipK � �, decreases to ð Þ ð Þ� p þ K þ 1 1 . Consequently, the dimension of the multi-integrals in the log-likelihood function in (3.2) will also decrease. This presentation is crucial for reducing the computational problems while coding. The model in (2.5) now becomes:

$$\begin{split} y\_i(t\_{\vec{\eta}}) &= f\left(t\_{\vec{\eta}}\right) + g\_i(t\_{\vec{\eta}}) + \varepsilon\_i \{t\_{\vec{\eta}}\} \\ &= \beta\_0 + \beta\_1 t\_{\vec{\eta}} + \beta\_2 t\_{\vec{\eta}}^2 + \dots + \beta\_p t\_{\vec{\eta}}^p + \sum\_{k=1}^K \mu\_{\vec{\eta}k} \{t\_{\vec{\eta}} - \mathcal{K}\_k\}\_+^p \\ &+ \upsilon\_{l0} + \upsilon\_{l1} t\_{\vec{\eta}} + \upsilon\_{l2} t\_{\vec{\eta}}^2 + \dots + \upsilon\_{\vec{\eta}t} t\_{\vec{\eta}}^p + \varepsilon\_i \{t\_{\vec{\eta}}\}. \end{split} \tag{6}$$

The model in (2.6) can be rewritten in matrix notation as:

$$\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \mathbf{Z}\mathbf{b} + \boldsymbol{\varepsilon}, \tag{7}$$

where

Pr Y<sup>i</sup> ð ð Þj t Yið Þs ; Ti ≥ sÞ ¼ Pr Y<sup>i</sup> ð Þ ð Þj t Yið Þs ; Ti ¼ s , s ≤ t: (3)

On the other hand, endogenous time-varying covariates are the ones that do not satisfy the

Prð Þ Yið Þj t Yið Þs ; Ti ≥ s 6¼ Prð Þ Yið Þj t Yið Þs ; Ti ¼ s , s ≤ t:

In the penalized spline regression models [8, 9], the observed longitudinal covariate is modeled using the truncated power functions with a general power basis of degree p. Moreover, the longitudinal response is also parameterized as a linear mixed-effects model to specify the individual curves and impose the amount of smoothing. As a result, the coefficients of the knots will be constrained to handle smoothing. In particular, the longitudinal submodel for the

� �, ε<sup>i</sup> tij

p ij <sup>þ</sup><sup>X</sup> K

2

<sup>þ</sup>;…; tij � <sup>K</sup><sup>K</sup> � �<sup>p</sup>

� � � <sup>N</sup> <sup>0</sup>; <sup>σ</sup><sup>2</sup>

p ij <sup>þ</sup><sup>X</sup> K

k¼1

þ

� �

th subject. The assumptions for the random effects for the i

p ij <sup>þ</sup><sup>X</sup> K

ij þ … þ vipt

ð Þ ð Þ� p þ K þ 1 1 . Consequently, the dimension of the multi-integrals in the log-likelihood

� �, wipk � <sup>N</sup> <sup>0</sup>; <sup>σ</sup><sup>2</sup>

ij þ … þ βpt

2

� � is a ð Þ ð Þ� <sup>p</sup> <sup>þ</sup> <sup>2</sup><sup>K</sup> <sup>þ</sup> <sup>1</sup> <sup>1</sup> vector of random

w

k¼1

� �<sup>T</sup> � <sup>N</sup>ð Þ <sup>0</sup>; <sup>D</sup> , where <sup>D</sup> <sup>¼</sup> Diag Dð Þ <sup>11</sup>;…; DKK . By doing this, the

p ij þ ε<sup>i</sup> tij � �:

<sup>u</sup> <sup>þ</sup> <sup>σ</sup><sup>2</sup> w

ij þ … þ vipt

degree p. Moreover, K1, …, K<sup>K</sup> are fitted K knots, for which K is chosen following Ruppert et al. [9], Chapter 5), Appendix D. The function fð Þ: is the smooth function which reflects the overall

ε � �,

k¼1

þ,

wipk tij � K<sup>k</sup> � �<sup>p</sup>

ð Þ: is the smooth function which reflects the individual

ðupk þ wipkÞ tij � K<sup>k</sup>

<sup>i</sup> ¼ vi0;…; vip; uip1;…; uipK

� �. In order to allow greater flexibility,

þ:

is known as the truncated power basis of

� �<sup>T</sup> in the function <sup>f</sup>ð Þ: is

is a ð Þ ð Þ� p þ 1 1 row vector of fixed

� � and they are independent of one

� �<sup>p</sup>

� �, decreases to

þ

th subject are

(5)

(4)

upk tij � K<sup>k</sup> � �<sup>p</sup>

condition in (2.2). In particular,

108 Topics in Splines and Applications

th subject at time point tij is

Here, the set 1; tij; …; t

effects and b<sup>T</sup>

vi0;…; vip � �<sup>T</sup> � <sup>N</sup> <sup>0</sup>;

effects for the i

yij ¼ f tij

f tij

gi tij

trend of the population. The function gi

another. We can now rewrite (2.4) as

yi tij � � <sup>¼</sup> f tij

we assume that uip1;…; uipK

p

� � <sup>þ</sup> gi tij

� � <sup>¼</sup> <sup>β</sup><sup>0</sup> <sup>þ</sup> <sup>β</sup>1tij <sup>þ</sup> … <sup>þ</sup> <sup>β</sup>p<sup>t</sup>

� � <sup>¼</sup> vi<sup>0</sup> <sup>þ</sup> vi1tij <sup>þ</sup> vi2<sup>t</sup>

n o

curves. To constrain the coefficient of knots, the vector up1;…; upK

u

� � <sup>þ</sup> <sup>ε</sup><sup>i</sup> tij � �

2

<sup>i</sup> ¼ up1;…; upK; vi0;…; vip; wip1;…; wipK

ij; tij � K<sup>1</sup> � �<sup>p</sup>

treated as random effects. Therefore, <sup>β</sup><sup>T</sup> <sup>¼</sup> <sup>β</sup>0; …; <sup>β</sup><sup>p</sup>

<sup>P</sup> ð Þ, upk � <sup>N</sup> <sup>0</sup>; <sup>σ</sup><sup>2</sup>

� � <sup>þ</sup> gi tij

¼ β<sup>0</sup> þ β1tij þ β2t

We let uipk <sup>¼</sup> upk <sup>þ</sup> wipk and note that uipk � <sup>N</sup> <sup>0</sup>; <sup>σ</sup><sup>2</sup>

dimension of the vector of random effects, b<sup>T</sup>

þ vi<sup>0</sup> þ vi1tij þ vi2t

� � <sup>þ</sup> <sup>ε</sup> tij

i

$$\mathbf{X} = \begin{bmatrix} \mathbf{X}\_1 \\ \vdots \\ \mathbf{X}\_n \end{bmatrix}, \quad \mathbf{Z} = \begin{bmatrix} \mathbf{X}\_1 & 0 & \dots & 0 & \mathbf{Z}\_1 & 0 & \dots & 0 \\ 0 & \mathbf{X}\_2 & \dots & 0 & 0 & \mathbf{Z}\_2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & \mathbf{X}\_n & 0 & 0 & \dots & \mathbf{Z}\_n \end{bmatrix},$$

$$\mathbf{X}\_i = \begin{bmatrix} 1 & t\_{i1} & t\_{i1}^p & \dots & t\_{i1}^p \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ 1 & t\_{in\_i} & t\_{in\_i}^p & \dots & t\_{in\_i}^p \end{bmatrix}, \quad \mathbf{Z}\_i = \begin{bmatrix} (t\_{i1} - \mathcal{K}\_1)\_+^p & \dots & (t\_{i1} - \mathcal{K}\_k)\_+^p \\ \vdots & & \vdots & \vdots \\ (t\_{in\_i} - \mathcal{K}\_1)\_+^p & \dots & (t\_{in\_i} - \mathcal{K}\_k)\_+^p \end{bmatrix},$$

$$\mathbf{b}^T = \begin{pmatrix} \mathbf{v}\_{10}, \dots, \mathbf{v}\_{1p}, \dots, \mathbf{v}\_{n0}, \dots, \mathbf{v}\_{np}, u\_{1p1}, \dots, u\_{1pK}, \dots, u\_{np1}, \dots, u\_{npK} \end{pmatrix},$$

$$\boldsymbol{\beta}^T = \begin{pmatrix} \boldsymbol{\beta}\_0, \dots, \boldsymbol{\beta}\_p \end{pmatrix}.$$

Here, <sup>y</sup> is the <sup>P</sup><sup>n</sup> i¼1 ni � 1 � � matrix of observed longitudinal data; <sup>X</sup> is the <sup>P</sup><sup>n</sup> i¼1 ni � ð Þ p þ 1 � � matrix of fixed effect covariates; <sup>Z</sup> is the <sup>P</sup><sup>n</sup> i¼1 ni � ð Þ p þ K þ 1 n � � matrix of random effect covariates and <sup>ε</sup> is the <sup>P</sup><sup>n</sup> i¼1 ni � 1 � � matrix of error.

Postulating a proportion hazard model, the penalized spline joint models for longitudinal and time-to-event data is defined by

$$\begin{split} h\_i(t|\mathcal{M}\_i(t), \mathfrak{w}\_i) &= \lim\_{dt \to 0} \Pr\{t \le T\_i^\* < t + dt | T\_i^\* \ge t, \mathcal{M}\_i(t), \mathfrak{w}\_i\}/dt \\ &= h\_0(t) \exp\left\{\gamma^T \mathfrak{w}\_i + \alpha m\_i(t)\right\}, \end{split} \tag{8}$$

where h0ð Þt is the hazard at baseline and w<sup>i</sup> is a vector of baseline covariates (such as treatment indicator, gender of a patient, etc.). Furthermore, MiðÞ¼ t f g mið Þs ; 0 ≤ s < t denotes the history of the true unobserved longitudinal process up to time point t.

Using (2.7), the longitudinal submodel for the i th subject is given by

$$\begin{cases} m\_i(t) = m\_i(t) + \varepsilon\_i(t), \varepsilon\_i(t) \sim N(\mathbf{0}, \sigma\_\varepsilon^2) \\\\ y\_i(t) = \mathbf{X}\_i^T(t)\boldsymbol{\beta} + \mathbf{X}\_i^T(t)\boldsymbol{\sigma}\_i + \mathbf{Z}\_i^T(t)\boldsymbol{\mu}\_i + \varepsilon\_i(t) \end{cases} \tag{9}$$
 
$$\boldsymbol{\sigma}\_i \sim N(\mathbf{0}, \sum), \boldsymbol{\mu}\_i \sim N(\mathbf{0}, \mathbf{D})\_\prime$$

3.1. Likelihood and score functions

<sup>t</sup> ; θ<sup>T</sup> <sup>y</sup> ; θ<sup>T</sup> b � �<sup>T</sup>

where <sup>θ</sup> <sup>¼</sup> <sup>θ</sup><sup>T</sup>

can be written as

p y<sup>i</sup>

jbi; θ<sup>y</sup> � �<sup>p</sup> <sup>b</sup><sup>i</sup> ð Þ¼ ; <sup>θ</sup><sup>b</sup>

Following Rizopoulos [2], we assume that the vector of time-independent random effects bi

<sup>j</sup>bi; <sup>θ</sup> � �

Penalized Spline Joint Models for Longitudinal and Time-To-Event Data

<sup>p</sup> <sup>y</sup><sup>i</sup> tij � �jbi; <sup>θ</sup> � �, (12)

ε

http://dx.doi.org/10.5772/intechopen.75975

� �<sup>p</sup> <sup>b</sup><sup>i</sup> ð Þ ; <sup>θ</sup><sup>b</sup> <sup>d</sup>bi, (13)

<sup>h</sup>0ð Þ<sup>s</sup> exp <sup>γ</sup><sup>T</sup>w<sup>i</sup> <sup>þ</sup> <sup>α</sup>mið Þ<sup>s</sup> ds � � <sup>2</sup>

<sup>i</sup> tij � �v<sup>i</sup> � <sup>Z</sup><sup>T</sup>

<sup>i</sup> tij � �ui∥<sup>2</sup>

h0 � �<sup>T</sup>

� �<sup>T</sup> is the parameter

denoting

111

3 5:

(14)

(15)

<sup>j</sup>bi; <sup>θ</sup> � � <sup>¼</sup> p Ti; <sup>δ</sup>ijb<sup>i</sup> ð Þ ; <sup>θ</sup> <sup>p</sup> <sup>y</sup><sup>i</sup>

j

vector for longitudinal outcomes and θ<sup>b</sup> ¼ vechð Þ G is the vector-half of the variance matrix of random effects. In addition, we assume that the hazard rate at time t conditional on the covariate path depends on the current value of longitudinal outcomes and the censoring mechanism is independent of the true event times and future longitudinal measurements. Under these assumptions, the log-likelihood formulation of the penalized spline joint models

p Ti; <sup>δ</sup>ijbi; <sup>θ</sup>t; <sup>β</sup> � �<sup>p</sup> <sup>y</sup><sup>i</sup>

denotes the full parameter vector with <sup>θ</sup><sup>t</sup> <sup>¼</sup> <sup>γ</sup><sup>T</sup>; <sup>α</sup>; <sup>θ</sup><sup>T</sup>

jbi; θ<sup>y</sup>

S TijMið Þ Ti ; <sup>w</sup>i; <sup>θ</sup>t; <sup>β</sup> � �

ð Ti

0

<sup>i</sup> tij � �<sup>β</sup> � <sup>X</sup><sup>T</sup>

2σ<sup>2</sup> ε

, bi) over the unknown θt, β and b<sup>i</sup>

( )

exp �

4

underlies both the longitudinal and survival processes. This means that

<sup>j</sup>bi; <sup>θ</sup> � � <sup>¼</sup> <sup>Y</sup>

the parameter vector for the survival outcomes. Furthermore, <sup>θ</sup><sup>y</sup> <sup>¼</sup> <sup>β</sup><sup>T</sup>; <sup>σ</sup><sup>2</sup>

p Ti; δi; y<sup>i</sup>

lð Þ¼ θ l θjTi; δi; y<sup>i</sup> � �

> log <sup>ð</sup> bi

where the conditional density for survival part has the form of

<sup>¼</sup> <sup>h</sup>0ð Þ<sup>t</sup> exp <sup>γ</sup><sup>T</sup>w<sup>i</sup> <sup>þ</sup> <sup>α</sup>mið Þ<sup>t</sup> � � � � <sup>δ</sup><sup>i</sup>

Moreover, the density for the longitudinal part with the random effects is given by

� �<sup>p</sup> <sup>b</sup><sup>i</sup> ð Þ ; <sup>θ</sup><sup>b</sup>

exp � <sup>∥</sup>yi tij � � � <sup>X</sup><sup>T</sup>

exp �b<sup>T</sup>

<sup>i</sup> G�<sup>1</sup> bi=2 � �,

p yi tij � �jbi; <sup>θ</sup><sup>y</sup>

<sup>2</sup> detð Þ <sup>G</sup> �1=<sup>2</sup>

where qb denotes the dimensionality of the random effects vector.

<sup>¼</sup> <sup>X</sup> i

p Ti; <sup>δ</sup>ijbi; <sup>θ</sup>t; <sup>β</sup> � � <sup>¼</sup> h TijMið Þ Ti ; <sup>w</sup>i; <sup>θ</sup>t; <sup>β</sup> � �<sup>δ</sup><sup>i</sup>

Here, S tð Þ is the survival function at time t.

Y j

<sup>¼</sup> <sup>1</sup> 2πσ<sup>2</sup> ε � �ni 2

� ð Þ <sup>2</sup><sup>π</sup> �qb

We consider the log likelihood of the (Ti, δi, y<sup>i</sup>

p y<sup>i</sup>

where the covariance matrix of random effects b<sup>T</sup> <sup>i</sup> <sup>¼</sup> vi0;…; vip; uip1; …; uipK � � is given as

$$\mathbf{G} = \text{Cov}(\mathbf{b}\_i) = \begin{bmatrix} \sum & \mathbf{0} \\ \mathbf{0} & \mathbf{D} \end{bmatrix}.$$

To complete the specification of the model in (2.8), we now need to define the form for the baseline risk function h0ð Þ: . Motivated by the fact that in real life, h0ð Þ: is usually unknown. Therefore, two options are adopted to determine the form of the function h0ð Þ: in this chapter. First, a standard option is to use a known parametric distribution for the risk function. For this option, the Gompertz distribution is chosen. Second, the piecewise constant model is chosen when h0ð Þ: is considered completely unspecified.

Therefore, the proposed penalized spline joint models considered in this chapter are as follows:

Model 1: Penalized spline joint model with hazard rate at base line having Gompertz distribution

$$\begin{cases} h\_i(t|\mathcal{M}\_i(t), \mathfrak{w}\_i) = \lambda\_1 \exp\left(\lambda\_2 t\right) \exp\left\{\gamma^T \mathfrak{w}\_i + a m\_i(t)\right\} \\ m\_i(t) = \mathbf{X}\_i^T(t)\boldsymbol{\beta} + \mathbf{X}\_i^T(t)\boldsymbol{\sigma}\_i + \mathbf{Z}\_i^T(t)\boldsymbol{\mu}\_i. \end{cases} \tag{10}$$

Model 2: Penalized spline joint model with a piecewise-constant baseline risk function

$$\begin{cases} h\_i(t|\mathcal{M}\_i(t), \mathbf{w}\_i) = \sum\_{q=1}^{Q} \xi\_q I\left(\nu\_{q-1} < t \le \nu\_q\right) \exp\left\{\mathbf{y}^T \mathbf{w}\_i + am\_i(t)\right\} \\ \qquad m\_i(t) = \mathbf{X}\_i^T(t)\boldsymbol{\beta} + \mathbf{X}\_i^T(t)\mathbf{v}\_i + \mathbf{Z}\_i^T(t)\boldsymbol{\mu}\_i \end{cases} \tag{11}$$

where 0 ¼ ν<sup>0</sup> < ν<sup>1</sup> < … < ν<sup>Q</sup> denotes a split of the time scale, with ν<sup>Q</sup> being larger than the largest observed time and ξ<sup>q</sup> denotes the value of the baseline hazard in the interval ν<sup>q</sup>�<sup>1</sup>; ν<sup>q</sup> � �. In both models, Xi, Zi, β, v<sup>i</sup> and u<sup>i</sup> are given in (2.7).

#### 3. Parameter estimation

After defining the two penalized spline joint models, we now present the joint likelihood and score functions of the parameters in the models. The ECM algorithm is also presented in this section.

#### 3.1. Likelihood and score functions

Using (2.7), the longitudinal submodel for the i

110 Topics in Splines and Applications

8 >>><

>>>:

where the covariance matrix of random effects b<sup>T</sup>

when h0ð Þ: is considered completely unspecified.

(

8 >><

>>:

3. Parameter estimation

section.

hiðtjMið Þ<sup>t</sup> , <sup>w</sup>iÞ ¼ <sup>X</sup>

In both models, Xi, Zi, β, v<sup>i</sup> and u<sup>i</sup> are given in (2.7).

lows:

yi

ðÞ¼ <sup>t</sup> <sup>X</sup><sup>T</sup>

v<sup>i</sup> � N 0;

th subject is given by

<sup>i</sup> ð Þ<sup>t</sup> <sup>v</sup><sup>i</sup> <sup>þ</sup> <sup>Z</sup><sup>T</sup>

P 0 0 D � �

� � <sup>X</sup> , <sup>u</sup><sup>i</sup> � <sup>N</sup>ð Þ <sup>0</sup>; <sup>D</sup> ,

ε � �

<sup>i</sup> ð Þt u<sup>i</sup> þ εið Þt

<sup>i</sup> ¼ vi0;…; vip; uip1; …; uipK

:

� � is given as

(9)

(10)

(11)

� �.

miðÞ¼ <sup>t</sup> mið Þþ <sup>t</sup> <sup>ε</sup>ið Þ<sup>t</sup> , <sup>ε</sup>iðÞ� <sup>t</sup> <sup>N</sup> <sup>0</sup>; <sup>σ</sup><sup>2</sup>

<sup>i</sup> ð Þ<sup>t</sup> <sup>β</sup> <sup>þ</sup> <sup>X</sup><sup>T</sup>

G ¼ Covð Þ¼ b<sup>i</sup>

To complete the specification of the model in (2.8), we now need to define the form for the baseline risk function h0ð Þ: . Motivated by the fact that in real life, h0ð Þ: is usually unknown. Therefore, two options are adopted to determine the form of the function h0ð Þ: in this chapter. First, a standard option is to use a known parametric distribution for the risk function. For this option, the Gompertz distribution is chosen. Second, the piecewise constant model is chosen

Therefore, the proposed penalized spline joint models considered in this chapter are as fol-

Model 1: Penalized spline joint model with hazard rate at base line having Gompertz distribution

hið Þ¼ <sup>t</sup>jMið Þ<sup>t</sup> ; <sup>w</sup><sup>i</sup> <sup>λ</sup><sup>1</sup> exp ð Þ <sup>λ</sup>2<sup>t</sup> exp <sup>γ</sup><sup>T</sup>w<sup>i</sup> <sup>þ</sup> <sup>α</sup>mið Þ<sup>t</sup> � �

<sup>i</sup> ð Þ<sup>t</sup> <sup>β</sup> <sup>þ</sup> <sup>X</sup><sup>T</sup>

ξqI ν<sup>q</sup>�<sup>1</sup> < t ≤ ν<sup>q</sup>

where 0 ¼ ν<sup>0</sup> < ν<sup>1</sup> < … < ν<sup>Q</sup> denotes a split of the time scale, with ν<sup>Q</sup> being larger than the largest observed time and ξ<sup>q</sup> denotes the value of the baseline hazard in the interval ν<sup>q</sup>�<sup>1</sup>; ν<sup>q</sup>

After defining the two penalized spline joint models, we now present the joint likelihood and score functions of the parameters in the models. The ECM algorithm is also presented in this

<sup>i</sup> ð Þ<sup>t</sup> <sup>v</sup><sup>i</sup> <sup>þ</sup> <sup>Z</sup><sup>T</sup>

� � exp <sup>γ</sup><sup>T</sup>w<sup>i</sup> <sup>þ</sup> <sup>α</sup>mið Þ<sup>t</sup> � �

<sup>i</sup> ð Þt ui,

<sup>i</sup> ð Þ<sup>t</sup> <sup>v</sup><sup>i</sup> <sup>þ</sup> <sup>Z</sup><sup>T</sup>

<sup>i</sup> ð Þt ui:

miðÞ¼ <sup>t</sup> <sup>X</sup><sup>T</sup>

Q

q¼1

miðÞ¼ <sup>t</sup> <sup>X</sup><sup>T</sup>

Model 2: Penalized spline joint model with a piecewise-constant baseline risk function

<sup>i</sup> ð Þ<sup>t</sup> <sup>β</sup> <sup>þ</sup> <sup>X</sup><sup>T</sup>

Following Rizopoulos [2], we assume that the vector of time-independent random effects bi underlies both the longitudinal and survival processes. This means that

$$\begin{aligned} p\left(T\_i, \delta\_i, y\_i | \mathfrak{b}\_i; \mathfrak{G}\right) &= p(T\_i, \delta\_i | \mathfrak{b}\_i; \mathfrak{G}) p\left(y\_i | \mathfrak{b}\_i; \mathfrak{G}\right) \\ p\left(y\_i | \mathfrak{b}\_i; \mathfrak{G}\right) &= \prod\_j p\left\{y\_i(t\_{\vec{\eta}}) | \mathfrak{b}\_i; \mathfrak{G}\right\}, \end{aligned} \tag{12}$$

where <sup>θ</sup> <sup>¼</sup> <sup>θ</sup><sup>T</sup> <sup>t</sup> ; θ<sup>T</sup> <sup>y</sup> ; θ<sup>T</sup> b � �<sup>T</sup> denotes the full parameter vector with <sup>θ</sup><sup>t</sup> <sup>¼</sup> <sup>γ</sup><sup>T</sup>; <sup>α</sup>; <sup>θ</sup><sup>T</sup> h0 � �<sup>T</sup> denoting the parameter vector for the survival outcomes. Furthermore, <sup>θ</sup><sup>y</sup> <sup>¼</sup> <sup>β</sup><sup>T</sup>; <sup>σ</sup><sup>2</sup> ε � �<sup>T</sup> is the parameter vector for longitudinal outcomes and θ<sup>b</sup> ¼ vechð Þ G is the vector-half of the variance matrix of random effects. In addition, we assume that the hazard rate at time t conditional on the covariate path depends on the current value of longitudinal outcomes and the censoring mechanism is independent of the true event times and future longitudinal measurements. Under these assumptions, the log-likelihood formulation of the penalized spline joint models can be written as

$$\begin{split} l(\boldsymbol{\Theta}) &= l(\boldsymbol{\Theta}|T\_i, \delta\_i, \mathbf{y}\_i) \\ &= \sum\_i \log \int\_{\mathbf{b}\_i} p(T\_i, \delta\_i | \mathbf{b}\_i; \boldsymbol{\Theta}\_t, \boldsymbol{\beta}) p\left(y\_i | \mathbf{b}\_i; \boldsymbol{\Theta}\_y\right) p(\mathbf{b}\_i; \boldsymbol{\Theta}\_b) d\mathbf{b}\_i \end{split} \tag{13}$$

where the conditional density for survival part has the form of

$$\begin{split} p\left(T\_{i},\delta\_{i}|\mathbf{b}\_{i};\boldsymbol{\theta}\_{i},\boldsymbol{\beta}\right) &= h\left(T\_{i}|\mathcal{M}\_{i}(T\_{i}),\mathbf{w}\_{i};\boldsymbol{\theta}\_{i},\boldsymbol{\beta}\right)^{\circ k}\mathcal{S}\left(T\_{i}|\mathcal{M}\_{i}(T\_{i}),\mathbf{w}\_{i};\boldsymbol{\theta}\_{i},\boldsymbol{\beta}\right) \\ &= \left[h\_{0}(t)\exp\left\{\boldsymbol{\gamma}^{T}\mathbf{w}\_{i} + am\_{i}(t)\right\}\right]^{\circ k}\exp\left[-\begin{matrix}{l}\_{i} \\ -\left\{h\_{0}(s)\exp\left\{\boldsymbol{\gamma}^{T}\mathbf{w}\_{i} + am\_{i}(s)ds\right\} \\ 0 \end{matrix}\right]. \end{split} \tag{14}$$

Here, S tð Þ is the survival function at time t.

Moreover, the density for the longitudinal part with the random effects is given by

$$\begin{split} p\left(y\_{i}|\mathbf{b};\Theta\_{\mathbf{y}}\right)p(\mathbf{b}\_{i};\Theta\_{\mathbf{b}}) &= \prod\_{j} p\left(y\_{i}(t\_{\vec{\eta}})|\mathbf{b}\_{i};\Theta\_{\mathbf{y}}\right)p(\mathbf{b}\_{i};\Theta\_{\mathbf{b}}) \\ &= \frac{1}{\left(2\pi\sigma\_{\varepsilon}^{2}\right)^{\frac{n}{2}}} \exp\left\{-\frac{\|\mathbf{y}\_{i}(t\_{\vec{\eta}}) - \mathbf{X}\_{i}^{T}(t\_{\vec{\eta}})\boldsymbol{\beta} - \mathbf{X}\_{i}^{T}(t\_{\vec{\eta}})\mathbf{v}\_{i} - \mathbf{Z}\_{i}^{T}(t\_{\vec{\eta}})\boldsymbol{\mu}\_{i}\|^{2}}{2\sigma\_{\varepsilon}^{2}}\right\} \\ &\times (2\pi)^{-\frac{n}{2}} \det(\mathbf{G})^{-1/2} \exp\left(-\mathbf{b}\_{i}^{T}\mathbf{G}^{-1}\mathbf{b}\_{i}/2\right). \end{split} \tag{15}$$

where qb denotes the dimensionality of the random effects vector.

We consider the log likelihood of the (Ti, δi, y<sup>i</sup> , bi) over the unknown θt, β and b<sup>i</sup>

$$1\log l(\boldsymbol{\theta}|T\_i, \delta\_i, \mathbf{y}\_i, \mathbf{b}\_i) = \log p\left(T\_i, \delta\_i|\mathbf{b}\_i; \theta\_i, \beta\right) + \log p(\mathbf{y}\_i|\mathbf{b}\_i; \beta) + \log p(\mathbf{b}\_i; \mathbf{G}).$$

Step 1: Initialization

<sup>Q</sup> <sup>θ</sup>jθð Þ it � � <sup>¼</sup> <sup>X</sup>

3.3 Set θð Þ it

starting value of the parameter vector is <sup>θ</sup>ð Þ<sup>0</sup> <sup>¼</sup> <sup>θ</sup>ð Þ<sup>0</sup>

expected function of the complete data log-likelihood as follows:

log p Ti; δijb<sup>i</sup> ð Þþ ; θ log pðy<sup>i</sup>

log p Ti; δi; y<sup>i</sup>

Step 3: The conditional M-step for the penalized joint models. This step will be implemented in four stages as follows:

3.1 Given the current value of the parameter vector at the i

3.2 Propose the new value for the first parameter θð Þ prop

calculate the log likelihood at <sup>l</sup> <sup>θ</sup>ð Þ prop � � where <sup>θ</sup>ð Þ prop <sup>¼</sup> <sup>θ</sup>ð Þ prop

3.4 Similarly, based on the value of the parameter vector θð Þ it

<sup>1</sup> <sup>¼</sup> <sup>θ</sup>ð Þ prop if <sup>l</sup> <sup>θ</sup>ð Þ prop � � <sup>≥</sup> <sup>l</sup> <sup>θ</sup>ð Þ it � �, otherwise set <sup>θ</sup>ð Þ it

second parameter and continue updating for the last parameter, θð Þ it

Step 4: Iterate among steps 2–3 until the algorithm numerically converges.

we calculate the log likelihood at <sup>l</sup> <sup>θ</sup>ð Þ it � � <sup>¼</sup> <sup>P</sup>

tively as detailed in Appendix B.

the estimated observed information matrix

calculate the log-likelihood using (3.2).

i

<sup>¼</sup> <sup>X</sup> i

ð

ð

Step 2: The E-step for the penalized joint models

We first initialize the parameters. We assume that there are m parameters in the models and the

We fill in the missing data and replace the log-likelihood function of the observed data with the

; <sup>b</sup>i; <sup>θ</sup> � � � � :<sup>p</sup> <sup>b</sup>ijTi; <sup>δ</sup>i; <sup>y</sup><sup>i</sup>

<sup>1</sup> ;…; <sup>θ</sup>ð Þ<sup>0</sup> m

; <sup>θ</sup>ð Þ it � �db<sup>i</sup>

<sup>j</sup>bi; <sup>θ</sup>Þ þ log <sup>p</sup>ðbi; <sup>θ</sup><sup>Þ</sup> � �:<sup>p</sup> <sup>b</sup>ijTi; <sup>δ</sup>i; <sup>y</sup><sup>i</sup>

<sup>i</sup> log <sup>Ð</sup>

To update the new values for parameters in the conditional M-step, we have the closed-form estimates for the measurement of error variance σ<sup>2</sup> and the covariance matrix of the random effects respectively by maximizing the expected function <sup>Q</sup> <sup>θ</sup>jθð Þ it � �. Unfortunately, we cannot obtain closed-form expressions for the remaining of the parameters. We thus employ the one-

Following Louis [12], standard errors for the parameter estimates can be calculated by using

step Newton-Raphson approach to get the updates for βð Þ itþ<sup>1</sup> , γð Þ itþ<sup>1</sup> , αð Þ itþ<sup>1</sup> and θð Þ itþ<sup>1</sup>

bi p Ti; δi; y<sup>i</sup>

� �. Based on these initial values, we

http://dx.doi.org/10.5772/intechopen.75975

Penalized Spline Joint Models for Longitudinal and Time-To-Event Data

th iteration <sup>θ</sup>ð Þ it <sup>¼</sup> <sup>θ</sup>ð Þ it

<sup>1</sup> which maximizes <sup>Q</sup> <sup>θ</sup>jθð Þ it � �. Then, we

<sup>1</sup> , we update the new value for the

<sup>m</sup> and set <sup>θ</sup>ð Þ itþ<sup>1</sup> <sup>¼</sup> <sup>θ</sup>ð Þ it

<sup>2</sup> ;…; <sup>θ</sup>ð Þ it m

; <sup>b</sup>i; <sup>θ</sup>ð Þ it � �dbi.

� �.

<sup>1</sup> ; <sup>θ</sup>ð Þ it

<sup>1</sup> <sup>¼</sup> <sup>θ</sup>ð Þ it .

; <sup>θ</sup>ð Þ it � �dbi:

<sup>1</sup> ; <sup>θ</sup>ð Þ it

<sup>2</sup> ; …; <sup>θ</sup>ð Þ it m

<sup>m</sup> .

<sup>h</sup><sup>0</sup> respec-

� �,

(18)

113

The function for maximizing the log likelihood involves the density function of survival time and least squares with a penalty term, which is

$$\log p\left(T\_i, \delta\_i | \mathbf{b}\_i; \boldsymbol{\theta}\_i, \boldsymbol{\beta}\right) - \frac{\left(\mathbf{y}\_i - \mathbf{X}\_i \boldsymbol{\beta} - \mathbf{X}\_i \boldsymbol{\sigma}\_i - \mathbf{Z}\_i \boldsymbol{\mu}\_i\right)^T \left(\mathbf{y}\_i - \mathbf{X}\_i \boldsymbol{\beta} - \mathbf{X}\_i \boldsymbol{\sigma}\_i - \mathbf{Z}\_i \boldsymbol{\mu}\_i\right)}{\sigma\_\varepsilon^2} - \mathbf{b}\_i^T \mathbf{G}^{-1} \mathbf{b}\_i. \tag{16}$$

According to Ruppert et al. [9], the term σ<sup>2</sup> εbT <sup>i</sup> G�<sup>1</sup> b<sup>i</sup> is called a roughness penalty and the variance components matrix defined as <sup>F</sup> <sup>¼</sup> <sup>σ</sup><sup>2</sup> εG�<sup>1</sup> . Using a Lagrange multiplier argument, the variance components matrix is the condition to constrain the coefficients of the knots ui. These will restrict the influence of the variables ð Þ <sup>t</sup> � Kk <sup>p</sup> <sup>þ</sup> and will lead to smoother spline functions.

Using (3.2), the score vector for the penalized spline joint models can be expressed as:

$$\begin{split} S(\boldsymbol{\theta}) &= \sum\_{i} \frac{\partial}{\partial \boldsymbol{\theta}^{T}} \log \int p\big(T\_{i}, \delta\_{i} | \boldsymbol{\theta}\_{i}; \boldsymbol{\theta}\_{i}, \boldsymbol{\beta} \big) p\big(\boldsymbol{y}\_{i} | \boldsymbol{\theta}\_{i}; \boldsymbol{\theta}\_{y} \big) p(\boldsymbol{\theta}\_{i}; \boldsymbol{\theta}\_{b}) d\boldsymbol{\theta}\_{i} \\ &= \sum\_{i} \int \frac{\partial}{\partial \boldsymbol{\theta}^{T}} \log \left\{ p\big(T\_{i}, \delta\_{i} | \boldsymbol{\theta}\_{i}; \boldsymbol{\theta}\_{i}, \boldsymbol{\beta} \big) p(\boldsymbol{y}\_{i} | \boldsymbol{\theta}\_{i}; \boldsymbol{\theta}\_{y}) p(\boldsymbol{\theta}\_{i}; \boldsymbol{\theta}\_{b}) \right\} p\big(\boldsymbol{\theta}\_{i} | T\_{i}, \delta\_{i}, \boldsymbol{y}\_{i}; \boldsymbol{\theta} \big) d\boldsymbol{\theta}\_{i}. \end{split} \tag{17}$$

The requirement for numerical integration with respect to the random effects is one of the main difficulties in the joint models [2]. The maximum likelihood estimation (MLE) is typically obtained using standard maximization algorithms such as expectation maximization algorithm or Newton-Raphson algorithm.

#### 3.2. The ECM algorithm

The EM algorithm has been widely used in the joint models, such as for the standard joint model of Rizopoulos [2] and for the generalized linear mixed joint model [10]. The ECM algorithm is a natural extension of EM algorithm for which the maximization process on the M-step is conditional on some functions of the parameters under estimation. It also can reduce computer time. The ECM algorithm will be used to obtain the maximum likelihood estimates of the penalized spline joint models following McLachlan and Krishnan [11] in this chapter.

In these models, the random effects b<sup>i</sup> are considered as missing data. Hence, it is difficult to estimate directly the parameter vector θ that maximizes the observed data log likelihood lð Þ θ in (3.2). Alternatively, we can estimate the parameter vector θ that maximizes the expected value of the complete data log-likelihood which is E log p Ti; δi; y<sup>i</sup> ; <sup>b</sup>i; <sup>θ</sup> � �jTi; <sup>δ</sup>i; <sup>y</sup><sup>i</sup> ; <sup>θ</sup>ð Þ it n o, where θð Þ it is the parameter vector given at the i th iteration.

The following are the steps of this algorithm.

#### Step 1: Initialization

1 log l θjTi; δi; y<sup>i</sup>

112 Topics in Splines and Applications

; bi

log p Ti; <sup>δ</sup>ijbi; <sup>θ</sup>t; <sup>β</sup> � � � <sup>y</sup><sup>i</sup> � <sup>X</sup>i<sup>β</sup> � <sup>X</sup>iv<sup>i</sup> � <sup>Z</sup>iu<sup>i</sup>

and least squares with a penalty term, which is

According to Ruppert et al. [9], the term σ<sup>2</sup>

variance components matrix defined as <sup>F</sup> <sup>¼</sup> <sup>σ</sup><sup>2</sup>

∂ <sup>∂</sup>θ<sup>T</sup> log

ð ∂

<sup>S</sup>ð Þ¼ <sup>θ</sup> <sup>X</sup> i

> <sup>¼</sup> <sup>X</sup> i

rithm or Newton-Raphson algorithm.

3.2. The ECM algorithm

will restrict the influence of the variables ð Þ <sup>t</sup> � Kk <sup>p</sup>

ð

� � <sup>¼</sup> log p Ti; <sup>δ</sup>ijbi; <sup>θ</sup>t; <sup>β</sup> � � <sup>þ</sup> log <sup>p</sup> <sup>y</sup><sup>i</sup>

� �<sup>T</sup>

The function for maximizing the log likelihood involves the density function of survival time

εbT <sup>i</sup> G�<sup>1</sup>

εG�<sup>1</sup>

variance components matrix is the condition to constrain the coefficients of the knots ui. These

The requirement for numerical integration with respect to the random effects is one of the main difficulties in the joint models [2]. The maximum likelihood estimation (MLE) is typically obtained using standard maximization algorithms such as expectation maximization algo-

The EM algorithm has been widely used in the joint models, such as for the standard joint model of Rizopoulos [2] and for the generalized linear mixed joint model [10]. The ECM algorithm is a natural extension of EM algorithm for which the maximization process on the M-step is conditional on some functions of the parameters under estimation. It also can reduce computer time. The ECM algorithm will be used to obtain the maximum likelihood estimates of the penalized spline joint models following McLachlan and Krishnan [11] in this chapter.

In these models, the random effects b<sup>i</sup> are considered as missing data. Hence, it is difficult to estimate directly the parameter vector θ that maximizes the observed data log likelihood lð Þ θ in (3.2). Alternatively, we can estimate the parameter vector θ that maximizes the expected

th iteration.

value of the complete data log-likelihood which is E log p Ti; δi; y<sup>i</sup>

where θð Þ it is the parameter vector given at the i

The following are the steps of this algorithm.

Using (3.2), the score vector for the penalized spline joint models can be expressed as:

p Ti; <sup>δ</sup>ijbi; <sup>θ</sup>t; <sup>β</sup> � �<sup>p</sup> <sup>y</sup><sup>i</sup>

<sup>∂</sup>θ<sup>T</sup> log p Ti; <sup>δ</sup>ijbi; <sup>θ</sup>t; <sup>β</sup> � �pðy<sup>i</sup>

σ2 ε

jbi; θ<sup>y</sup> � �<sup>p</sup> <sup>b</sup><sup>i</sup> ð Þ ; <sup>θ</sup><sup>b</sup> <sup>d</sup>b<sup>i</sup>

<sup>j</sup>bi; <sup>θ</sup>yÞpðbi; <sup>θ</sup>b<sup>Þ</sup> � �:<sup>p</sup> <sup>b</sup>ijTi; <sup>δ</sup>i; <sup>y</sup><sup>i</sup>

<sup>j</sup>bi; <sup>β</sup> � � <sup>þ</sup> log <sup>p</sup> <sup>b</sup><sup>i</sup> ð Þ ; <sup>G</sup> :

b<sup>i</sup> is called a roughness penalty and the

. Using a Lagrange multiplier argument, the

<sup>þ</sup> and will lead to smoother spline functions.

� <sup>b</sup><sup>T</sup> <sup>i</sup> G�<sup>1</sup>

; θ � �dbi:

; <sup>b</sup>i; <sup>θ</sup> � �jTi; <sup>δ</sup>i; <sup>y</sup><sup>i</sup> ; <sup>θ</sup>ð Þ it n o

bi: (16)

(17)

,

y<sup>i</sup> � Xiβ � Xiv<sup>i</sup> � Ziu<sup>i</sup> � � We first initialize the parameters. We assume that there are m parameters in the models and the starting value of the parameter vector is <sup>θ</sup>ð Þ<sup>0</sup> <sup>¼</sup> <sup>θ</sup>ð Þ<sup>0</sup> <sup>1</sup> ;…; <sup>θ</sup>ð Þ<sup>0</sup> m � �. Based on these initial values, we calculate the log-likelihood using (3.2).

#### Step 2: The E-step for the penalized joint models

We fill in the missing data and replace the log-likelihood function of the observed data with the expected function of the complete data log-likelihood as follows:

$$\begin{split} Q\left(\boldsymbol{\theta}|\boldsymbol{\theta}^{(i)}\right) &= \sum\_{i} \int \log\left\{ p\left(T\_{i},\delta\_{i},\mathbf{y}\_{i},\mathbf{b}\_{i};\boldsymbol{\theta}\right) \right\} p\left(\mathbf{b}\_{i}|T\_{i},\delta\_{i},\mathbf{y}\_{i};\boldsymbol{\theta}^{(i)}\right) d\boldsymbol{\theta}\_{i} \\ &= \sum\_{i} \int \left(\log p(T\_{i},\delta\_{i}|\mathbf{b};\boldsymbol{\theta}) + \log p(\mathbf{y}\_{i}|\mathbf{b};\boldsymbol{\theta}) + \log p(\mathbf{b}\_{i};\boldsymbol{\theta})\right) \cdot p\left(\mathbf{b}\_{i}|T\_{i},\delta\_{i},\mathbf{y}\_{i};\boldsymbol{\theta}^{(i)}\right) d\boldsymbol{\theta}\_{i} .\end{split} \tag{18}$$

#### Step 3: The conditional M-step for the penalized joint models.

This step will be implemented in four stages as follows:

3.1 Given the current value of the parameter vector at the i th iteration <sup>θ</sup>ð Þ it <sup>¼</sup> <sup>θ</sup>ð Þ it <sup>1</sup> ; <sup>θ</sup>ð Þ it <sup>2</sup> ; …; <sup>θ</sup>ð Þ it m � �, we calculate the log likelihood at <sup>l</sup> <sup>θ</sup>ð Þ it � � <sup>¼</sup> <sup>P</sup> <sup>i</sup> log <sup>Ð</sup> bi p Ti; δi; y<sup>i</sup> ; <sup>b</sup>i; <sup>θ</sup>ð Þ it � �dbi.

3.2 Propose the new value for the first parameter θð Þ prop <sup>1</sup> which maximizes <sup>Q</sup> <sup>θ</sup>jθð Þ it � �. Then, we calculate the log likelihood at <sup>l</sup> <sup>θ</sup>ð Þ prop � � where <sup>θ</sup>ð Þ prop <sup>¼</sup> <sup>θ</sup>ð Þ prop <sup>1</sup> ; <sup>θ</sup>ð Þ it <sup>2</sup> ;…; <sup>θ</sup>ð Þ it m � �.

$$\text{13.3 Set } \boldsymbol{\theta}\_1^{(it)} = \boldsymbol{\theta}^{(prop)} \text{ if } l \left( \boldsymbol{\theta}^{(prop)} \right) \ge l \left( \boldsymbol{\theta}^{(it)} \right), \text{ otherwise set } \boldsymbol{\theta}\_1^{(it)} = \boldsymbol{\theta}^{(it)}.$$

3.4 Similarly, based on the value of the parameter vector θð Þ it <sup>1</sup> , we update the new value for the second parameter and continue updating for the last parameter, θð Þ it <sup>m</sup> and set <sup>θ</sup>ð Þ itþ<sup>1</sup> <sup>¼</sup> <sup>θ</sup>ð Þ it <sup>m</sup> .

#### Step 4: Iterate among steps 2–3 until the algorithm numerically converges.

To update the new values for parameters in the conditional M-step, we have the closed-form estimates for the measurement of error variance σ<sup>2</sup> and the covariance matrix of the random effects respectively by maximizing the expected function <sup>Q</sup> <sup>θ</sup>jθð Þ it � �. Unfortunately, we cannot obtain closed-form expressions for the remaining of the parameters. We thus employ the onestep Newton-Raphson approach to get the updates for βð Þ itþ<sup>1</sup> , γð Þ itþ<sup>1</sup> , αð Þ itþ<sup>1</sup> and θð Þ itþ<sup>1</sup> <sup>h</sup><sup>0</sup> respectively as detailed in Appendix B.

Following Louis [12], standard errors for the parameter estimates can be calculated by using the estimated observed information matrix

$$
\widehat{v}ar\left(\widehat{\boldsymbol{\theta}}\right) = \left\{\mathcal{T}\left(\widehat{\boldsymbol{\theta}}\right)\right\}^{-1}\boldsymbol{\lambda}
$$

where

$$\mathcal{I}\left(\widehat{\boldsymbol{\theta}}\right) = -\sum\_{i=1}^{n} \frac{\partial \mathcal{S}\_i(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}}\bigg|\_{\boldsymbol{\theta} = \dot{\boldsymbol{\theta}}}$$

:

outcomes. Furthermore, <sup>θ</sup><sup>y</sup> <sup>¼</sup> <sup>β</sup>0; <sup>β</sup>1; <sup>σ</sup><sup>2</sup>

(

4.1.2. Parameter estimation

and θ<sup>b</sup> ¼ D is the variance matrix of random effects.

ε

hiðÞ¼ t 0:1 exp 0ð Þ :5t exp 0f g :5xi þ 0:05mið Þt

longitudinal trajectories for the first 100 subjects reflecting the form as in (4.2).

To simulate the observed survival time Ti of the joint model in (4.1), we applied the methods adapted by Bender et al. [13], Austin [14] and Crowther and Lambert [15] to generate the true survival time. We further assumed that the censoring mechanism is exponentially distributed with parameter λ. The observed survival time was the minimum of the censoring time and the true survival time. We generated the survival time Ti for n ¼ 500 subjects with the parameters: β<sup>0</sup> ¼ 5, β<sup>1</sup> ¼ 2, λ<sup>1</sup> ¼ 0:1, λ<sup>2</sup> ¼ 0:5, γ ¼ 0:5,α ¼ 0:05, δ ¼ 2 and D ¼ Diagð Þ 2; 2; 2; 4 . Then we generated the longitudinal responses mið Þt using (4.2). The simulated model is therefore

miðÞ¼ <sup>t</sup> <sup>5</sup> <sup>þ</sup> <sup>2</sup><sup>t</sup> <sup>þ</sup> ui1ð Þ <sup>t</sup> � <sup>1</sup> <sup>þ</sup> <sup>þ</sup> ui2ð Þ <sup>t</sup> � <sup>2</sup> <sup>þ</sup> <sup>þ</sup> ui3ð Þ <sup>t</sup> � <sup>3</sup> <sup>þ</sup> <sup>þ</sup> vi0:

The sample of simulated data is presented in Appendix A. The curve of Kaplan-Meier estimate for the survival function of simulated data (left panel) and the longitudinal trajectories for the whole simulated sample (right panel) are presented in Figure 1. The dashed lines in the left panel correspond to 95% pointwise confidence intervals. It is clear from the plot of Kaplan-Meier estimator that the survival probability starts from 1 and decreases gradually until at the 5th month of the study. After this, it is nearly zero after 6 months or so. The right panel is the

The ECM algorithm, as described in Section 3.2, is now implemented to estimate all parameters in (4.4). The initial values of the parameters were set at β<sup>0</sup> ¼ 1, β<sup>1</sup> ¼ 1, λ<sup>1</sup> ¼ 0:05, λ<sup>2</sup> ¼ 0:1, γ ¼ 0:1, α ¼ 0:01, σ ¼ 1, D<sup>11</sup> ¼ 3, D<sup>22</sup> ¼ 3, D<sup>33</sup> ¼ 3, D<sup>44</sup> ¼ 3, respectively. However, these initial values can also be set randomly. The traces of each of these parameters are presented in Figures 2 and 3, respectively. The traces of estimates show the way how the algorithm updates

Figure 1. Kaplan-Meier estimate of the survival function of the simulated data of (4.4) (left panel). Longitudinal trajecto-

ries of the first 100 subjects from the simulated sample of (4.4) (right panel).

� �<sup>T</sup> is the parameter vector for longitudinal outcomes

Penalized Spline Joint Models for Longitudinal and Time-To-Event Data

http://dx.doi.org/10.5772/intechopen.75975

(22)

115
