Section 2 Frailty Models

#### **Chapter 5**

## Dependent Dirichlet Processes for Analysis of a Generalized Shared Frailty Model

*Chong Zhong, Zhihua Ma, Junshan Shen and Catherine Liu*

#### **Abstract**

Bayesian paradigm takes advantage of well-fitting complicated survival models and feasible computing in survival analysis owing to the superiority in tackling the complex censoring scheme, compared with the frequentist paradigm. In this chapter, we aim to display the latest tendency in Bayesian computing, in the sense of automating the posterior sampling, through a Bayesian analysis of survival modeling for multivariate survival outcomes with the complicated data structure. Motivated by relaxing the strong assumption of proportionality and the restriction of a common baseline population, we propose a generalized shared frailty model which includes both parametric and nonparametric frailty random effects to incorporate both treatment-wise and temporal variation for multiple events. We develop a survival-function version of the ANOVA dependent Dirichlet process to model the dependency among the baseline survival functions. The posterior sampling is implemented by the No-U-Turn sampler in Stan, a contemporary Bayesian computing tool, automatically. The proposed model is validated by analysis of the bladder cancer recurrences data. The estimation is consistent with existing results. Our model and Bayesian inference provide evidence that the Bayesian paradigm fosters complex modeling and feasible computing in survival analysis, and Stan relaxes the posterior inference.

**Keywords:** ANOVA DDP, dependent treatments, multivariate survival outcomes, recurrence, Stan

#### **1. Introduction**

The shared frailty model, coined by [1], has been widely used in the analysis of multivariate survival outcomes that might be associated with subgroups or clusters. Enormous work has been devoted to the development of the shared frailty model in both Bayesian and frequency paradigms, and the reviews can be found in [2, 3]. As an extension of the well-known Cox's proportional hazard model, *conditional on the frailty effect*, the traditional shared frailty model assumes a proportional hazards structure, that is, the hazard ratio between two sets of covariate values is proportional to their difference in relative risk scores over time [4]. Meanwhile, it fixes the baseline hazard function among all clusters.

Traditional shared frailty models provide a good framework for expediently mathematical tractability of the heterogeneity among multivariate observations, whereas in practice it needs modification and adaption to tolerate complex structure so as to incorporate cross information owing to the intra- and inter-subject variability [5, 6]. Take the renowned data on recurrences of bladder cancer, for instance [7]. There are three treatment arms, placebo, thiotepa, and pyridoxine. Patients had multiple recurrences of tumors which were sparse beyond the fourth recurrence. **Figure 1** displays the Kaplan-Meier estimators of the survival-function for the times of the first and the second recurrences under three treatments. One observes that in **Figure 1(a)**, the estimated survival curves at the first recurrence are crossed, indicating a crossed hazard, and therefore, the proportional hazard assumption is suspected [8]; in **Figure 1(b)**, the survival curve of pyridoxine drops below that of placebo from the fifth month at the second recurrence rather than above from the 10th month on at the first recurrence. This indicates the functional form of the survival curves varies between recurrences. Neglecting such characteristics of non-proportionality and stratification of recurrences may yield inefficiency by encumbering borrowing strength from potentially related information sources, and consequently may jeopardize the prediction of the global survival times. Moreover, dependency might exist among the treatment strata and the stratification of recurrences [5, 9].

Consequently, more complex modeling is needy to characterize the dependence among the baseline hazard functions and treatment strata due to the temporal effects of recurrences. Frequentist inference and computing are pretty challenging and even infeasible within the most complex model setting. In Bayesian literature, there exists work that allows the baseline survival/hazard function to vary on a single level such as subgroups or the time axis ([10, 11]; among others). Nevertheless, rare work has taken bi-level varying baseline survival/hazard function into account [12], not to mention that dependence among treatment strata [5].

We propose a generalized shared frailty model (GSFM) for multiple events time data that allows the baseline hazard function to change dually along with the types of events and treatment strata, so as to strengthen the ability to borrow information from many sources. The proposed GSFM postulates multiplier frailties including both parametric and nonparametric ones, where the parametric frailty random effect accounts for the within-subject association by treating each subject as a cluster; and a nonparametric frailty effect represents dependency among treatment strata and temporal recurrences. For the GSFM, we suggest a Bayesian solution to estimate the regression coefficient vector, the variance parameter of the frailty

**Figure 1.**

*The Kaplan-Meier estimator of survival functions for first recurrence time (a) and second recurrence (b) in the bladder cancer data.*

*Dependent Dirichlet Processes for Analysis of a Generalized Shared Frailty Model DOI: http://dx.doi.org/10.5772/intechopen.101502*

term, and baseline survival functions stratified by treatments and recurrences. In a Bayesian workflow, the posterior distribution is determined by the combination of observational data in the form of the likelihood function and the prior distribution represented based on the background knowledge. From a Bayesian perspective, we model the dependent nonparametric prior through transferring the data context aforementioned into the ANOVA dependent Dirichlet process (ANOVA DDP), which will be further reviewed in Section 2. The construction of the No-U-Turn sampler for Markov chain Monte Carlo (MCMC) sampling is automated by Stan [13] with its R interface [14]. The posterior inference is conducted by Stan as well.

The rest of this chapter is organized as follows. In Section 2, under typical data scenarios of dependence structure, we summarize several modified versions of the dependent Dirichlet process (DDP) initiated from MacEachern's regression spirit that nests dependent predictors into the traditional Dirichlet process (DP). In Section 3, we postulate the GSFM and transform the dependent dual-stratified multiple events to the survival-function version of the ANOVA DDP. We have a short comparison between Stan and Nimble, two contemporary Bayesian computing tools based on our user experience. In Section 4, we demonstrate the validity of the GSFM, its Bayesian inference, and analysis of the data on recurrences of bladder cancer. A brief conclusion is contained in Section 5.

#### **2. Review of MacEachern's DDP**

The DP is the most popular Bayesian nonparametric prior since the seminal work of [15]. The belief in data background that there exists some kind of dependence structure stimulates construction and selection of proper dependent prior. Some dependent DPs are constructed for unsupervised purposes such as clustering [16, 17]. The DDP prior adopted in our proposed model is supervised and predictordependent, originated from [18, 19], named as MacEachern's DDP in two recent review papers, which are interpretive and comprehensive [20, 21]. The key idea behind MacEachern's DDP is that the distributions of the random measures are marginally DP distributed, validated by in our Subsections 3.2 and 3.3. Therefore, we here confine how MacEachern's DDP (henceforth we use the DDP to denote the MacEachern's DDP if the context is clear) came into being expanded from the DP; and compare various modified versions of the DDP under various dependency structures. We focus on two fundamental elements of Sethuraman's construction of DP [22], the weight and the atom, following the insight of [21].

#### **2.1 DP vs. DDP**

The DP is a distribution on distributions whereas the DDP aims to construct a prior for a collection of distributions ℱ ¼ f g *Fx*j*x*∈ X indexed by covariate *x*. In general, there are several representations of the DP such as Polya Urn, Levy measure, and stick-breaking representations [23]. Here, we use Sethuraman's stickbreaking construction to connect the DP with the DDP. The stick-breaking construction is a kind of infinite sum representation that divides the DP into two countable series, the stick-breaking weights (SBW) and the atoms. Generally, a DP is expressed as a process with two components, the mass parameter determining the weights and the base measure to generate atoms. Through the stick-breaking construction, the DDP can be easily extended from the DP. We list their comparison in **Table 1**, where we can find that the dependency among the covariates set X is realized by indexing the mass parameter and base measure with the covariate *x*∈ X. More specifically, the dependency can be characterized through the dependency among the weights and atoms in the DDP.

The DDP can be widely applied to scenarios of various dependence data structures. We review modification versions of the DDP from three categories depending on which part it modifies in the stick-breaking representation, weights, atoms, or both. The first is to impose the dependency on the atoms but keep common weights, leading to two typical representatives, ANOVA and Spatial [9, 24, 25]. The ANOVA type DDP encoded the covariate dependence in the form of regression for the atom processes. The Spatial DDP models for nonstrationary spatial random fields with heterogeneous variance. The second category is to modify the weights to be dependent but keep the common atoms. The early and typical work is the time series DDP [26]. They introduced a Markov Beta process on the weights to account for the temporal dependency. The third category is to impose dependency on both weights and atoms [27]. They constructed vector autoregressive and autoregressive models for atoms and weights, respectively. We summarize the aforementioned types of typical modifications in **Figure 2**.

#### **3. Model and Bayesian inference**

Consider a clinical trial with multiple event types, for example, the time of the *k*th recurrence of a certain disease. In the trial, *n* subjects are divided into *G* strata of treatment. Our goal is to describe the relationship between the time to the *k*th recurrence of a subject, and its treatment stratum, as well as its vector of covariates *Z*. For a certain subject, the time of recurrences, may be dependent since they occur on the same individual and thus we assume an unobservable independent sharedfrailty random effect *W* to account for this dependence. On the other hand, we may allow the conditional hazard affiliated with the script pair *kj* to imply distinct survival distributions along with the temporal order of the recurrences of the disease and for specific treatment. For the *i*th subject in the *j*th treatment stratum, at the *k*th recurrence, given the value of frailty variable *wi* and its covariate vector *zkji*, we propose the following frailty model,

$$\lambda\_{k\circ}(t|w\_i, z\_{k\circ i}) = w\_i \lambda\_{0\circ}(t) \exp\left(\boldsymbol{\beta}^T z\_{k\circ i}\right), k = 1, \cdots, K, j = 1, \cdots, G, i = 1, \cdots, n\_j. \tag{1}$$

Model (1) is called the *generalized* shared frailty model in the sense that nonproportionality among *k*-varying recurrences is allowed by the fact that the righthand baseline hazard has footnotes *k* and *j*. We allow dependency among treatment strata in the model (1). Therefore, the baseline hazard function *λ*0*kj* acts as if a nonparametric frailty random measure accounting for the dependency owing to the recurrences and treatment schemes.


**Table 1.** *Comparison of DP and DDP.*


**Figure 2.** *Workflows of representative expansions of DDP.*

Model (1) is an extension of the classical shared frailty model (4.1.1) on page 101 of [4] since the baseline hazard function there does not vary among the recurrences and the treatment strata. Model (1) has the analog spirit to the frailty model (1) in [10], whereas their treatment strata are independent.

#### **3.1 Likelihood**

The corresponding survival function of the model is given by: *Skj <sup>t</sup>*j*wi*, *zkji* � � <sup>¼</sup> *<sup>S</sup>*0*kj*ð Þ*<sup>t</sup>* � �exp *<sup>β</sup><sup>T</sup>* ð Þ *zkji*þ*vi* , where *<sup>S</sup>*0*kj*ðÞ¼ *<sup>t</sup>* exp � Ð*t* <sup>0</sup>*λ*0*kj*ð Þ*<sup>s</sup> ds* � � denotes the baseline survival function of the *k*th recurrence for subjects in the *j*th treatment stratum, and *vi* ¼ log ð Þ *wi* denotes logarithm transformation of the frailty effect. Let *f*0*kj* be the corresponding baseline density function.

Given the data sample *Ykji*, *<sup>δ</sup>kji*, *zji* � �, where *Ykji* <sup>¼</sup> min *Ckji*, *Tkji* � �, *<sup>δ</sup>kji* <sup>¼</sup> *I Tkji* <sup>≤</sup>*Ckji* � �, with *Tkji* being the gap time between the (*<sup>k</sup>* � 1)th and *<sup>k</sup>*th recurrence of the *i*th subject in the *j*th stratum and *Ckij* being the corresponding censoring variable that is independent of *Tkji* given the covariate vector *zkji*, for *<sup>k</sup>* <sup>¼</sup> 1, <sup>⋯</sup>,*K*, *<sup>j</sup>* <sup>¼</sup> 1, <sup>⋯</sup>, *<sup>G</sup>*, *<sup>i</sup>* <sup>¼</sup> 1, <sup>⋯</sup>, *<sup>n</sup> <sup>j</sup>*, and <sup>P</sup>*<sup>G</sup> <sup>j</sup>*¼<sup>1</sup>*<sup>n</sup> <sup>j</sup>* <sup>¼</sup> *<sup>n</sup>*. In the *<sup>j</sup>*th stratum, suppose that there are *nkj* (*nkj* ≤ *n <sup>j</sup>*) subjects suffering from the *k*th recurrence. Then the likelihood is written as:

$$\prod\_{k=1}^{K} \prod\_{j=1}^{G} \prod\_{i=1}^{n\_{kj}} \left[ \exp \left( \boldsymbol{\beta}^{T} \boldsymbol{z}\_{kji} + \boldsymbol{v}\_{i} \right) \boldsymbol{f}\_{0kj} \left( \boldsymbol{y}\_{kji} \right) \left\{ \mathbf{S}\_{0kj} \left( \boldsymbol{y}\_{kji} \right) \right\}^{\left( \exp \left( \boldsymbol{\beta}^{T} \boldsymbol{z}\_{kji} + \boldsymbol{v}\_{i} \right) - 1 \right)} \right]^{\delta\_{kji}}$$

$$\times \left\{ \mathbf{S}\_{0kj} \left( \boldsymbol{y}\_{kji} \right) \right\}^{\left( 1 - \delta\_{kji} \right)} \exp \left( \boldsymbol{\beta}^{T} \boldsymbol{z}\_{kji} + \boldsymbol{v}\_{i} \right)$$

#### **3.2 Survival-function version of the ANOVA DDP**

In the Bayesian workflow for the estimation, prior distributions are first determined. We here specify appropriate nonparametric priors for *S*0*kj* and *f*0*kj*. Since they can be easily derived from one to the other, we here only introduce the priors for *S*0*kj*.

We divide *S*0*kj* into *K* groups, and the *k*th group has *G* baseline survival functions of different treatment strata at the *k*th time of recurrence. That is, for a fixed *<sup>k</sup>*, <sup>S</sup>*<sup>k</sup>* <sup>¼</sup> *<sup>S</sup>*0*kj*, *<sup>j</sup>* <sup>¼</sup> 1, <sup>⋯</sup>, *<sup>G</sup>* � � is a collection of baseline survival functions with length *G* indexed by the categorical covariate *j* denoting the treatment stratum. The next procedures come from the spirit of [9]. As a general example, suppose two dugs *A* and *B* will be taken in treatment, with *V* and *U* levels of doses, respectively. In this case, *G* = *VU* denotes the number of treatment strata and let the level of the *j*th stratum be (*v*,*w*). We write the stick-breaking form of *<sup>S</sup>*0*kj* such that *<sup>S</sup>*0*kj*ðÞ¼ *<sup>t</sup>* <sup>P</sup><sup>∞</sup> *<sup>h</sup>*¼<sup>1</sup>*phI t* <sup>&</sup>gt;*θkjh* � �. We impose an ANOVA structure on *θkjh*:

$$
\theta\_{kjk} = m\_{kh} + A\_{kvh} + B\_{kwh}, \tag{2}
$$

where *mkh* denotes the ANOVA effect shared by all the strata at the *k*th recurrence, and the rest terms are the ANOVA effects of the *j*th stratum at the *k*th recurrence. Let the three components be independently generated from three distributions, and marginally on *j*, the baseline survival function *S*0*kj* follows a DP. The aforementioned procedure implies that S*<sup>k</sup>* is a survival-function version of the *Dependent Dirichlet Processes for Analysis of a Generalized Shared Frailty Model DOI: http://dx.doi.org/10.5772/intechopen.101502*

ANOVA DDP. If we consider a single event, that is, *K* = 1, with linear effects of a regressor vector *<sup>z</sup>* involved in (2) such that *<sup>θ</sup>jh* <sup>¼</sup> *mh* <sup>þ</sup> *Avh* <sup>þ</sup> *Bwh* <sup>þ</sup> *<sup>β</sup><sup>T</sup> <sup>h</sup> Z*, then it reduces to the univariate survival regression model in [25].

Since any function in the stick-breaking form is discrete almost surely, we place a convolution through the Dirichlet process mixture (DPM) model [28]. Particularly, since the baseline survival functions are defined on the positive half-real line, the convolution kernel in DPM should be positive such as log-normal, Gamma, and Weibull. In this chapter, a log-normal kernel is considered. For different recurrences, we treat the relationship among S*ks* to be independent.

#### **3.3 One-way ANOVA DDP**

Considering the data of our interest, where only one drug and one level of dose is used in each treatment stratum, we introduce the modeling of the survival-function version of one-way ANOVA DDP. In this case, the prior for the S*<sup>k</sup>* reduces to a oneway ANOVA form since the dependency among the *G* treatment strata is explained by only one ANOVA effect. Furthermore, if we set *mkh* ¼ 0, *αkh* ¼ ð Þ *θ<sup>k</sup>*1*<sup>h</sup>*, ⋯, *θkGh T* reduces to a *G*-variate variable denoting the locations of all *G* baseline distributions and thus *<sup>θ</sup>kjh* <sup>¼</sup> *<sup>α</sup><sup>T</sup> khd <sup>j</sup>*, where *dj* is the design vector of the *j*th stratum to select the appropriate ANOVA effects corresponding to *j*.

With the above notations, we summarize the procedure to construct the survival-function version of one-way ANOVA DDP prior in model (1) as follows:


Step 1 is a standard stick-breaking representation for DP. Step 2 is kernel mixture of DP whereas the kernel is a survival function rather than a cumulative distribution function. The realization of Step 2 is quite straightforward in Stan as it provides the function lognormal\_lccdf to be used as the kernel of the survival function of the log-normal family.

In Step 3, we specify the base measure as the prior for the location and shape parameters of the log-normal kernel directly rather than adding another hyper prior distribution like [25] did. The main reason is to simplify the computation in Stan. Particularly, inspired by [30, 31], we use the half-Cauchy distribution as the noninformative prior for the variance parameter instead of the inverse Gamma prior. In our practice, the choice of half-Cauchy prior significantly improves the speed of convergence and mixture performance of the MCMC chains in our real data analysis and simulation. Another interesting point we met in numerical studies is that the informativeness of the base measure for *θ*. Here, we do not assign the noninformative distribution but a weakly informative one is considered since we find such a weakly informative prior provides better MCMC performance than that of non-informative one with a higher effective sample size and better mixture performance. In our other research experience, the weakly informative prior for the variance parameter in the mixing component of the DPM seems to be more preferable.

#### **3.4 Other priors and MCMC**

In terms of the prior for the parametric prior *wi*, we choose to log-normal prior that *vi* <sup>¼</sup> log ð Þ *wi* and *vi* � *<sup>N</sup>* 0, *<sup>τ</sup>*<sup>2</sup> ð Þ, where *<sup>τ</sup>* <sup>&</sup>gt;0 is an unknown parameter. We further assign a half Cauchy prior for *τ* s.t *τ* � Cauchyþð Þ 0, 5 as a non-informative prior. The prior for the vector of regression coefficients is *β* � *N*ð Þ 0, 1000*I* as a noninformative prior.

We use the truncated Dirichlet process to replace the infinite summand in the DP. The selection of the truncation point is often ad-hoc. Since in Stan the NUTS cannot sampler discrete parameters, we have to fit the truncation number and the mass parameter before the MCMC procedure. In general, the truncation number is set to be large enough s.t the truncated part is negligible. Gelman et al. [32] suggests using a truncation number *L* that is greater than 5*M* + 5. In our computation, we set *L* = 12.

The MCMC sampling for the posterior distribution is realized in Stan. Stan and its **R** version are widely used in statistical modeling and high-performance statistical computing, especially in Bayesian. Stan realizes the MCMC sampling through the No-U-Turn sampler (NUTS). Stan automates the deriving of the fully conditional posterior distribution and NUTS is able to obtain high effective sample size [33].

#### **3.5 Stan and NIMBLE: Programming styles**

The MCMC sampling procedure is implemented in Stan and we also tried to implement the model in NIMBLE, another contemporary Bayesian computing tool in **R**. Stan and NIMBLE are two contemporary Bayesian computing tools that have drawn arising interest for Bayesian analysis but still remain under active development [34, 35]. The main advantage of Stan and NIMBLE is that they provide clear automatic posterior sampling procedures based on their specific sampling algorithms without particular justification. Therefore, users can be released from complicated probabilistic deriving and implementation. There has been a buzz group discussion about the comparison between Stan and NIMBLE in environments like [36, 37]. One comparison of their built-in samplers is demonstrated through implementing weakly informative and informative estimation within the trimmed mean regression model setting [38]. Here we contribute a naive comparison on their programming styles based on the first two authors' experience in coding this project and using Stan and NIMBLE, respectively.

A Bayesian paradigm is made up of three main steps, the prior, likelihood, and the posterior. MCMC generates samples to approximate the posterior distribution. Therefore, what one needs to set in a Bayesian computing tool is the prior and likelihood, let alone Stan or NIMBLE. Nevertheless, Stan and NIMBLE take different programming styles in writing likelihood. In Stan, the default way to present the log-likelihood is the syntax target and users can add log contribution to it freely, which is similar to the natural language and straightforward to users whatever level of mathematical background. In NIMBLE, the default way is to transfer the likelihood into some standard distributions given by NIMBLE, which may not be friendly for users who have a relatively less mathematical background.

*Dependent Dirichlet Processes for Analysis of a Generalized Shared Frailty Model DOI: http://dx.doi.org/10.5772/intechopen.101502*

We take fitting the finite mixture of the Gaussian model as an example. For a fixed positive integer *<sup>L</sup>*, the distribution of *<sup>Y</sup>* is given by *FY*ðÞ¼ *<sup>s</sup>* <sup>P</sup>*<sup>L</sup> <sup>l</sup>*¼<sup>1</sup>*pl N s*j*μl*, *<sup>σ</sup>*<sup>2</sup> *l* � � and the log-likelihood is log *L p*ð Þ¼ , *<sup>μ</sup>*, *<sup>σ</sup>*j*<sup>Y</sup>* <sup>P</sup>*<sup>n</sup> i*¼1 P*<sup>L</sup> <sup>l</sup>*¼<sup>1</sup> log *pl* � � <sup>þ</sup> log *<sup>ϕ</sup> yi* j*μ*1, *σ<sup>l</sup>* � � � � , where *ϕ* denotes the density function of normal distribution. The code for Stan and NIMBLE to implement this model is listed in Listing 1.1 and 1.2, respectively. In Listing 1.1 we clearly find that the contribution to the syntax target is just the sum of log(*pl*) and the logarithm of the density of normal distribution denoted by normal\_lpdf. The rest is to assign a Dirichlet prior to the weights *pl* and other parameters. However, in the NIMBLE code shown in Listing 1.2, we have to transfer the likelihood into some sampling procedures by IMAGING that there are *L* clusters of random numbers, the random numbers are i.i.d Gaussian within each cluster, and the probability a random number is drawn from the *l*th cluster is *pl*. Thereafter, the Dirichlet prior is assigned to *pl*s. Such imagine matches the Bayesian philosophy but when the likelihood function becomes to be quite complicated, to understand this sampling procedure may not be easy anymore, especially for practitioners not coming from a mathematics or statistics background.

**Listing 1.1**: *Stan code for modeling mixture of Gaussian distribution*

```
1 data {
 2 int<lower=1> N;
 3 vector[N] y;
 4 int<lower=1> L;
 5 }
 6 parameters {
 7 simplex[L] p;
 8 vector[L] mu;
 9 vector<lower=0>[L] sigma;
10 }
11 model {
12 p � dirichlet(rep_vector(1, L));
13 mu � normal(0, 100);
14 sigma � cauchy(0, 2.5);
15 for(i in 1:N) {
16 vector[L] lp_i;
17 for(l in 1:L) {
18 lp_i[l] = log(p[l]) + normal_lpdf(y[i]|mu[l], sigma[l]);
19 }
20 target += log_sum_exp(lp_i);
21 }
22 }
```

```
Listing 1.2: NIMBLE code for modeling mixture of Gaussian distribution
```

```
1 NimbleCode <- nimbleCode ({
2 for (i in 1:N) {
3 y[i] � dnorm(mu_y[z[i]], tau = tau_y[z[i]])
4 z[i] � dcat(p[1,L])
5 }
6 for (j in 1:L) {
```
 mu\_y[j] dnorm(0, 0.01) tau\_y[j] dgamma(0.01, 0.01) 9 } p[1:L] ddirch(alpha0[1,L]) 11 }) NimbleData <- list(y = y) NimbleConsts <- list(L = L, N = length(NimbleData\$y), alpha0 = rep(1, L)) NimbleInits <- list(mu\_y = rnorm(NimbleConsts\$L), tau\_y = rgamma (NimbleConsts\$L),p = rep(1/NimbleConsts\$L, NimbleConsts\$L))

#### **4. Application: Bladder cancer recurrences**

We apply the GSFM to analyze the bladder cancer recurrences data set contained in R package survival. Totally 118 subjects in the clinical trial are divided into three treatment strata including placebo, pyridoxine (vitamin B6), and thiotepa. Each subject may experience *k* (from 1 to 9) times of recurrences and may die from or not from the recurrence of bladder cancer. We do not discriminate the death from cancer and the recurrence, and the death from other causes is treated as censoring status. Our interest is the gap time between the (*k* 1)th and the *k*th recurrences. Besides the treatment schemes, two clinical covariates are considered: the number of tumors at the beginning (*x*1) and the size of the largest tumor (*x*2) within a subject. The values of these two covariates are evaluated at the beginning of each recurrence interval. This data set was once analyzed for the time between the first to the second recurrence as a univariate time-to-event outcome [39]. In this chapter, we consider both the first and the second recurrences and thus *K* = 2 here. The two covariates are scaled by divided by 100. To simplify the computation, the follow-up time is transferred from months to years to get lower scalars.

#### **4.1 Model-checking for baseline survival functions**

Before further inference, we need to check whether the proposed model is appropriate. As an alternative, a shared frailty model is fit by R package spBayeSurv. In the shared frailty model, the treatment strata are considered as indicator covariates in the parametric term. We run four independent MCMC chains for 5000 times with the first 2000 times burn-in and aggregate the rest chains together as the posterior samples under the GSFM. All chains are well mixed and convergent under the GSFM. For the shared frailty model, we run the MCMC 16,000 times with the first 6000 times burn-in through R function survregbayes using the "IID" Gaussian frailty under "PH" model name. Other settings are default.

The plots of the estimated baseline survival functions under different models stratified by treatment strata can be viewed in **Figure 3**. From that, we find the baseline survival functions estimated under the GSFM show similar trends as that of the K-M estimator in each recurrence and reflect the crossing survival curves at the first recurrence like the K-M estimator. However, the curves estimated by the shared frailty model are not crossed and cannot change along with recurrences. Therefore, the proposed GSFM is appropriate for the data.

#### **4.2 Parametric estimation I: Real data**

We use the mean of posterior samples (median for *τ*) as the estimator of parameter and we list the estimation of the vector of regression coefficients *β* and standard deviation parameter *τ* in **Table 2**.

#### *Dependent Dirichlet Processes for Analysis of a Generalized Shared Frailty Model DOI: http://dx.doi.org/10.5772/intechopen.101502*

#### **Figure 3.**

*The estimated baseline survival curves for the first (a) and second (b) recurrence; the black curves are estimated under the proposed generalized shared frailty model, and the pink curves are estimated under the traditional shared frailty model; the real lines, placebo; the dash lines, pyridoxine; the dotted lines, thiotepa.*


#### **Table 2.**

*The parametric estimation and the MCMC performance for the bladder cancer recurrences data.*

From **Table 2** we find that as the number of tumors at the start point increases, the hazard for recurrences increases as well whereas the larger size of the largest tumor will decrease the hazard. This conclusion is similar to that in [39] who analyzed the first recurrence as univariate time-to-event data by a transformation model.

Besides the parametric estimation result, we also report two metrics about the MCMC performance here. The first one is the effective sample size (ESS), an approximation to the number of "independent" draws in MCMC sampling. It shows that the ESS of all parameters is greater than 400, which is considered to be adequate by [40]. The ESS of *τ* is significantly lower than that of *β*, a possible reason is that the frailty random effect might be time-dependent *wi*(*t*) rather than a timefixed effect. Another metric of interest is the average time needed to generate each effective sample, called MCMC Pace. Stan development team emphasized the importance of MCMC Pace, and the definition is given by the team of NIMBLE in [41] as the time consuming of generating one effective sample. The MCMC Pace to generate *τ* is much higher than that of *β*, and we conjecture the possible reason is that the posterior distribution has a long upper tail leading to outliers in posterior samples, which slows down the speed to generate effective samples.

#### **4.3 Parametric estimation II: Simulation**

Another simulation study is considered to evaluate the performance of parametric estimation of the MCMC procedure. Our simulation aims to simulate the occurrences of multiple events on the same individual. We take *K* = 2 and *G* = 3 denote the number of types of events and the number of treatment strata,


*BIAS, averaged bias among the 150 simulations; RMSE, the root of mean square error of the estimation; ESD, averaged posterior estimated standard deviation; SDE, the standard deviation of point estimate; CP, the coverage probability of 95% credible interval.*

#### **Table 3.**

*Simulation results for the parametric terms.*

respectively. The simulation includes two independent covariates, *xi* � Bin 1, 0 ð Þ *:*5 and *x*<sup>2</sup> � *N*ð Þ 0, 1 to incorporate indicator variable and continuous variable as well. For *k* ¼ 1, 2, *j* ¼ 1, 2, 3, the baseline survival functions *S*0*kj* are set as:


When *k* = 1, the three baseline survival functions are crossed whereas when *<sup>k</sup>* = 2, the three curves are not. The vector of regression coefficients is *<sup>β</sup>* <sup>¼</sup> ð Þ 1, 1 *<sup>T</sup>* and the log frailty random effect *vi* � *N*ð Þ 0, 1 independently. The survival time is generated following model (1). The censoring variable of each event is generated from Unif(4,6) independently, leading to a censoring rate of about 28%. We set the number of subjects to be 90 and they are equally divided into three treatment strata. We repeat the simulation for 150 times.

**Table 3** summarizes the results for regression parameters *β* and the standard deviation of frailty effect *τ*, including the averaged bias (BIAS), the root of mean square error (RMSE), posterior estimated standard deviation (ESD) of each point estimate (posterior mean for *β* and median for *τ*), the standard deviation (across 150 replicated simulations) of the point estimate (SDE), and the coverage probability (CP) of the 95% credible interval (given by Wald-type credible interval). The results show that the point estimates of *β* and *τ* have quite little bias with low RMSE, ESD values are close to the corresponding SDEs, and the CP values are close to the nominal 95%.

#### **5. Discussion**

In this chapter, we show the power of Bayesian computing illustrated by successfully applying the ANOVA DDP model as the nonparametric prior for a relatively complicated shared frailty model. Our survival-function version of the ANOVA DDP, modified based on the ANOVA DDP directly in Subsection 3.3, is constructed for the shared frailty model, but can reduce to modeling the univariate dependent survival functions by involving the continuous covariates into the predictor space of the ANOVA DDP. Hence, our work is an extension of [25] to some extent. However, the proposed GSFM is different from the Linear DDP model for a single group which is a generalization of the accelerated failure time model [42, 43]. *Dependent Dirichlet Processes for Analysis of a Generalized Shared Frailty Model DOI: http://dx.doi.org/10.5772/intechopen.101502*

Furthermore, although we point out that there exists potentially dual dependence for dual stratification of treatment strata and recurrences, we just simply allow dependence in treatment strata and assume that the recurrences are independent in our methodology demonstration. The dependence across recurrences per subject is dealt with only by the parametric frailty random effect in the proposed shared frailty model. It is more reasonable to be incorporated into the baseline survival functions so that the interaction effects between recurrence and treatment may be accounted for. Under the one-level stratification, Hanson et al. [5] modeled such serial correlation among baseline hazard functions by constructing the so-called dependent tail free process as the prior. It is non-trivial to accommodate dual temporal and stratified dependency as a future research plan.

#### **Acknowledgements**

Chong Zhong's research was partially supported by GRF1531519, RGC, HKSAR. Zhihua Ma's research was partially supported by Shenzhen Institutions Stability Support Program 20200812101943002, China. Junshan Shen's research is partially supported by the Beijing Natural Science Foundation 1192006, China. Catherine Liu's research was partially supported by HKPOLYU grant YBTR, and GRF1531519, RGC, HKSAR.

#### **Thanks**

The first author Chong Zhong owes deep thanks to his parents. The authors thank Miss. Lulu Zhang for her efficient technical supports in text and figures. The authors thank the service manager Ms. Romina Rovan for her courtesy and professional service. The authors thank the invitation from the editor.

#### **Author details**

Chong Zhong1†, Zhihua Ma2†, Junshan Shen3† and Catherine Liu<sup>1</sup> \*†


† These authors contributed equally.

© 2021 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## **References**

[1] Vaupel JW, Manton KG, Stallard E. The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography. 1979;**16**(3): 439-454

[2] Duchateau L, Janssen P. The Frailty Model. New York: Springer Science & Business Media; 2007

[3] Balan TA, Putter H. A tutorial on frailty models. Statistical Methods in Medical Research. 2020;**29**(11): 3424-3454

[4] Ibrahim JG, Chen M-H, Sinha D. Bayesian Survival Analysis. New York: Springer Science & Business Media; 2001

[5] Hanson TE, Jara A, Zhao L, et al. A bayesian semiparametric temporallystratified proportional hazards model with spatial frailties. Bayesian Analysis. 2012;**7**(1):147-188

[6] de Castro M, Chen M-H, Zhang Y. Bayesian path specific frailty models for multi-state survival data with applications. Biometrics. 2015;**71**(3): 760-771

[7] Therneau TM. A Package for Survival Analysis in R. R Package Version 3.2-11. 2021

[8] Zeng D, Lin DY. Maximum likelihood estimation in semiparametric regression models with censored data. Journal of the Royal Statistical Society, Series B: Statistical Methodology. 2007; **69**(4):507-564

[9] De Iorio M, Müller P, Rosner GL, MacEachern SN. An anova model for dependent random measures. Journal of the American Statistical Association. 2004;**99**(465):205-215

[10] de Castro M, Chen M-H, Ibrahim JG, Klein JP. Bayesian transformation

models for multivariate survival data. Scandinavian Journal of Statistics. 2014; **41**(1):187-199

[11] Paulon G, De Iorio M, Guglielmi A, Ieva F. Joint modeling of recurrent events and survival: A bayesian nonparametric approach. Biostatistics. 2020;**21**(1):1-14

[12] Conlon ASC, Taylor JMG, Sargent DJ. Multi-state models for colon cancer recurrence and death with a cured fraction. Statistics in Medicine. 2014; **33**(10):1750-1766

[13] Stan Development Team. The Stan Core Library. Version 2.27. 2018

[14] Stan Development Team. RStan: The R Interface to Stan. R Package Version 2.21.2. 2020

[15] Ferguson TS. Prior distributions on spaces of probability measures. The Annals of Statistics. 1974;**2**(4):615-629

[16] Teh YW, Jordan MI, Beal MJ, Blei DM. Hierarchical dirichlet processes. Journal of the American Statistical Association. 2006;**101**(476):1566-1581

[17] Rodriguez A, Dunson DB, Gelfand AE. The nested dirichlet process. Journal of the American Statistical Association. 2008;**103**(483):1131-1154

[18] MacEachern SN. Dependent nonparametric processes. In: ASA Proceedings of the Section on Bayesian Statistical Science. Alexandria, VA: American Statistical Association; 1999

[19] MacEachern SN. Dependent Dirichlet Processes. Technical Report. Department of Statistics, The Ohio State University; 2000

[20] MacEachern SN. Nonparametric bayesian methods: A gentle introduction *Dependent Dirichlet Processes for Analysis of a Generalized Shared Frailty Model DOI: http://dx.doi.org/10.5772/intechopen.101502*

and overview. Communications for Statistical Applications and Methods. 2016;**23**(6):445-466

[21] Quintana FA, Mueller P, Jara A, MacEachern SN. The dependent dirichlet process and related models. arXiv preprint arXiv:2007.06129. 2020

[22] Sethuraman J. A constructive definition of Dirichlet priors. Statistica Sinica. JSTOR. 1994;**4**:639-650

[23] Phadia EG. Prior Processes and their Applications. New York: Springer; 2015

[24] Gelfand AE, Kottas A, MacEachern SN. Bayesian nonparametric spatial modeling with dirichlet process mixing. Journal of the American Statistical Association. 2005;**100**(471):1021-1035

[25] De Iorio M, Johnson WO, Müller P, Rosner GL. Bayesian nonparametric nonproportional hazards survival modeling. Biometrics. 2009;**65**(3): 762-771

[26] Nieto-Barajas LE, Müller P, Ji Y, Lu Y, Mills GB. A time-series ddp for functional proteomics profiles. Biometrics. 2012;**68**(3):859-868

[27] DeYoreo M, Kottas A. Modeling for dynamic ordinal regression relationships: An application to estimating maturity of rockfish in california. Journal of the American Statistical Association. 2018;**113**(521): 68-80

[28] Lo AY. On a class of bayesian nonparametric estimates: I. Density estimates. The Annals of Statistics. 1984; **12**:351-357

[29] Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. Bayesian Data Analysis. Florida: CRC Press; 2013

[30] Gelman A. Prior distributions for variance parameters in hierarchical

models (comment on article by browne and draper). Bayesian Analysis. 2006; **1**(3):515-534

[31] Gelman A, Jakulin A, Pittau MG, Su Y-S. A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics. 2008;**2**(4):1360-1383

[32] Ohlssen DI, Sharples LD, Spiegelhalter DJ. Flexible randomeffects models using bayesian semiparametric models: Applications to institutional comparisons. Statistics in Medicine. 2007;**26**(9):2088-2112

[33] Hoffman MD, Gelman A, et al. The no-u-turn sampler: Adaptively setting path lengths in hamiltonian monte carlo. Journal of Machine Learning Research. 2014;**15**(1):1593-1623

[34] Kerioui M, Mercier F, Bertrand J, Tardivon C, Bruno R, Guedj J, et al. Bayesian inference using hamiltonian monte-carlo algorithm for nonlinear joint modeling in the context of cancer immunotherapy. Statistics in Medicine. 2020;**39**(30):4853-4868

[35] Ma Z, Hu G, Chen M-H. Bayesian hierarchical spatial regression models for spatial data in the presence of missing covariates with applications. Applied Stochastic Models in Business and Industry. 2021;**37**(2):342-359

[36] Stan forums. Available from: https://discourse.mc-stan.org

[37] Nimble groups. Available from: https://r-nimble.org/more/issues-andgroups

[38] Zhang L. A bayesian comparison in stan and nimble by trimmed mean regression [M.Phil's thesis]. Hong Kong, China: The Hong Kong Polytechnic University; 2021

[39] Zeng D, Lin DY. Efficient estimation of semiparametric

transformation models for counting processes. Biometrika. 2006;**93**(3): 627-640

[40] Vehtari A, Gelman A, Simpson D, Carpenter B, Bürkner P-C. Ranknormalization, folding, and localization: An improved r for assessing convergence of MCMC. Bayesian Analysis. 2021;**1**(1):1-28

[41] de Valpine P, Paciorek C, Turek D, Michaud N, Anderson-Bergman C, Obermeyer F, Cortes CW, RodrÃguez A, Lang DT, Paganin S. NIMBLE User Manual. R Package Manual Version 0.11.1. 2021

[42] Hanson TE, Jara A. Surviving fully bayesian nonparametric regression models. Bayesian Theory and Applications. 2013:593-615

[43] Riva-Palacio A, Leisen F, Griffin J. Survival regression models with dependent bayesian nonparametric priors. Journal of the American Statistical Association. 2021:1-10

#### **Chapter 6**

## Modeling Heterogeneity Using Lindley Distribution

*Arvind Pandey and Lalpawimawha*

#### **Abstract**

Frailty models are intended for use in survival analysis to explain unobserved heterogeneity in an individual caused by various hereditary variables or environmental influences. A shared frailty model was utilized to examine the data. It is based on the idea that frailty affects the hazard rate in a multiplicative manner. In this manuscript, we introduce a new frailty model called the Lindley shared frailty model with exponential power and generalized Rayleigh as baseline distributions. The The Bayesian method of the Monte Carlo method of the Markov chain is used to estimate the parameters used in the model; simulation studies are also carried out to compare the actual and calculated values of the parameters; the proposed model is compared with the Bayesian comparison method Compare and propose the best model of infectious disease data.

**Keywords:** Bayesian technique, exponential power distribution, generalized Rayleigh distribution, Lindley frailty, MCMC

#### **1. Introduction**

The term frailty was coined by Vaupel et al. [1]. The frailty model is typically represented as an unobservable random variable that multiplies the risk function, with the frailty random variable supposed to be one of the parameter distributions, such as gamma, log-normal, positive stable, inverse Gaussian, power variance function, and so on. Let *Y* be a continuous random variable of lifetime of an individual and the frailty random variable (RV) be *V*. The conditional hazard function (CHF) for a given frailty variable *V* ¼ *v* at time *y* >0 is

$$
\nu m(\mathcal{y}|\upsilon) = \upsilon h\_0(\mathcal{y}) e^{X'\beta},\tag{1}
$$

where *m*0ð Þ*y* is a baseline hazard function (BHF) at time *y* >0, *X* is a covariate and *β* is a regression coefficient, these are in vector form. The CHF for given frailty at time *y*> 0 is

$$S(\mathcal{Y}|\mathcal{v}) = \mathfrak{e}^{-\int\_{0}^{\mathcal{V}} m(\mathbf{x}|\mathcal{v})d\mathbf{x}} = \mathfrak{e}^{-v\mathcal{M}\_0(\mathcal{y})\mathcal{e}^{\mathcal{X}^\beta}},\tag{2}$$

where *M*0ð Þ*y* is cumulative baseline hazard function (CBHF) at time *y*>0. Integrating over the range of frailty variable *V* having density *f v*ð Þ, we get marginal survival function as

$$\mathcal{S}(\mathbf{y}) = \int\_0^\infty \mathcal{S}(\mathbf{y}|\upsilon) f(\upsilon) d\upsilon,\tag{3}$$

$$=L\_v\left(M\_0(\mathcal{y})e^{X^\prime \beta}\right),\tag{4}$$

where *LV*ð Þ*:* is a Laplace transformation of the distribution of *V*. Once we have survival function at time *y*>0 of lifetime random variable of an individual one can obtain probability structure and can base their inference on it.

Frailty models have gained more attention in the recent medical research due to the uniqueness property of the frailty parameter. Generally, gamma distribution, log-normal distribution and inverse Gaussian distributions are the most commonly used frailty distributions [2, 3]. Hanagal and Dabade [4] introduced new Compound negative binomial shared frailty models for bivariate survival data using Weibull and generalized exponential as baseline distributions. Pandey et al. [5] compared gamma, inverse Gaussian and positive frailty models with generalized Pareto as baseline distribution. Pandey et al. [6] also compared gamma and inverse Gaussian frailty distributions under additive property.

To extract the features of the Lindley shared frailty model, we used Lindley as a frailty distribution with right censored data under generalized Rayleigh and exponential power as baseline distributions. The survival periods are dependent in this case because the frailty variable follows the Lindley distribution. The predicted value of the frailty distribution variance influences the population's degree of heterogeneity. The higher the variance of the frailty distribution, the more heterogeneity there is in the population under consideration. The frailty distribution becomes degraded when zero variance is observed. As a baseline distribution, the exponential power distribution is used. Because it exhibits a rising hazard rate, which is typical in real-life distributions, the exponential power distribution is chosen as the baseline distribution. The Lindley distribution with one parameter was first proposed by Lindley [7] for analyzing failure times data. It belongs to an exponential family, but it is used as an alternative to the exponential distribution. Lindley distribution is alluring due to the ability of modeling failure time data with increasing, decreasing, unimodal and bathtub shaped hazard rates. Ghitany et al. [8, 9] discussed different properties of Lindley distribution and also showed that Lindley distribution is better than the exponential distribution for modeling failure time data when considering hazard rate is unimodal or bathtub shaped. It is also shown that Lindley distribution is more flexible than exponential distribution in modeling lifetime data. Many authors have discussed and introduced different generalization of Lindley distribution. Bakouch et al. [10] introduced extended Lindley distribution. Ghitany et al. [11] proposed the power Lindley distribution and Shanker et al. [12] proposed two parameter Lindley distribution, which could also be reduced to one parameter case. The mean of a two parameter Lindley distribution is always greater than the mode indicating that the distribution is positively skewed.

The classic approach and the Bayesian approach are two widely utilized techniques in general. We can employ prior distributions here, therefore we'll estimate the model parameters using the Bayesian Markov Chain Monte Carlo (MCMC) approach. Furthermore, because characteristics with diverse posterior distributions may be easily generated, the results and model selection criteria can be clearly interpreted. Run after thinning mean and autocorrelation plots, follow-up plots, past plot couplings, sample autocorrelation plots dictate chain behavior, burn duration, autocorrelation delay, and how observations are made It's utilized for cognitive confirmation on its own. We also give simulation experiments to back up the model's performance. All of the model's estimation processes are detailed, as well as infection statistics relating to kidney infections.

*Modeling Heterogeneity Using Lindley Distribution DOI: http://dx.doi.org/10.5772/intechopen.100340*

In Sections 2 and 3, the introduction of the Lindley shared frailty model and baseline distributions are given, followed by proposed models and estimation strategies in Sections 4 and 5. In Sections 6 and 7, application of the proposed model and discussion are given.

#### **2. Lindley shared frailty model**

Let a continuous random variable *V* follows two parameter Lindley distribution (TPLDP) with parameters *α* and *λ* then density function of *V* is

$$f(v) = \begin{cases} \frac{a^2}{a\lambda + 1}(\lambda + v)e^{-\lambda v} & ; \ v > 0, a > 0, \lambda a > -1 \\\\ 0 & ; \text{ otherwise}, \end{cases} \tag{5}$$

and the Laplace transform is

$$L\_V(s) = \frac{a^2(\mathbf{1} + (s + a)\boldsymbol{\lambda})}{\left(s + a\right)^2(\mathbf{1} + \boldsymbol{\lambda}a)}, s + a > 0. \tag{6}$$

The mean and variance of frailty variable are **<sup>E</sup>**ð Þ¼ *<sup>Z</sup> αλ*þ<sup>2</sup> *α αλ* ð Þ <sup>þ</sup><sup>1</sup> and **<sup>V</sup>**ð Þ¼ *<sup>V</sup>* <sup>2</sup>þ4*αλ*þ*α*2*λ*<sup>2</sup> *<sup>α</sup>*2ð Þ *αλ*þ<sup>1</sup> <sup>2</sup> . For identifiability, we assume *<sup>V</sup>* has expected value equal to one i.e. *E V*ð Þ¼ 1, which imply that *<sup>α</sup>* <sup>=</sup> *<sup>ξ</sup>* and *<sup>λ</sup>*<sup>=</sup> <sup>2</sup>�*<sup>ξ</sup> ξ ξ*ð Þ �<sup>1</sup> . Under this restriction the density function and the Laplace transformation of Lindley distribution reduces to

$$f(v) = \begin{cases} \frac{e^{-\xi v} \xi \left(\xi^2 v + \xi - \xi v - 2\right)}{\xi - 2} & ; \; v > 0, \xi > 0\\ 0 & ; \; \text{otherwise}, \end{cases} \tag{7}$$

and the Laplace transform is

$$L\_V(s) = \frac{\xi(\xi + s(2 - \xi))}{\left(s + \xi\right)^2}. \tag{8}$$

with variance of *V* is <sup>4</sup>*ξ*�*ξ*2�<sup>2</sup> *<sup>ξ</sup>*<sup>2</sup> . The frailty variable *V* is degenerate at *V* ¼ 1. Replacing Laplace transform in Eq. (4), we get the unconditional bivariate survival function for the *j th* individual as

$$S(\boldsymbol{y}\_{1k}, \boldsymbol{y}\_{2k}) = \frac{\xi\left(\xi + \eta\_k \left(\mathcal{M}\_{01}(\boldsymbol{y}\_{1k}) + \mathcal{M}\_{02}(\boldsymbol{y}\_{2k})\right) (\mathcal{Z} - \xi)\right)}{\left(\eta\_k \left(\mathcal{M}\_{01}(\boldsymbol{y}\_{1k}) + \mathcal{M}\_{02}(\boldsymbol{y}\_{2k})\right) + \xi\right)^2} \tag{9}$$

where *M*<sup>01</sup> *y*1*<sup>k</sup>* � � and *<sup>M</sup>*<sup>02</sup> *<sup>y</sup>*2*<sup>k</sup>* � � are the cumulative baseline hazard functions of the lifetime *Y*1*<sup>k</sup>* and *Y*2*<sup>k</sup>* respectively.

And for without frailty, the model becomes

$$S(\mathcal{Y}\_{1k}, \mathcal{Y}\_{2k}) = e^{-\eta\_k \left(M\_{01}(\mathcal{Y}\_{1k}) + M\_{02}(\mathcal{Y}\_{2k})\right)}.\tag{10}$$

#### **3. Baseline distributions**

As a starting point, we'll look at the generalized Rayleigh distribution. Surles and Padgett [13] proposed the two-parameter Burr type X distribution, dubbed the generalized Rayleigh distribution, and demonstrated that the two-parameter generalized Rayleigh distribution may be utilized to describe strength and general lifetime data rather efficiently. The two-parameter generalized Rayleigh distribution can be utilized well in survival analysis to describe strength data as well as general lifetime data. If a continuous random variable *Y* has a two-parameter generalized Rayleigh distribution, the survival function, hazard function, and cumulative hazard function are as follows:

$$S(\boldsymbol{y}) = \mathbf{1} - \left(\mathbf{1} - e^{-(\boldsymbol{\lambda}\boldsymbol{y})^2}\right)^a; \boldsymbol{y} > \mathbf{0}, \boldsymbol{\lambda} > \mathbf{0}, a > \mathbf{0} \tag{11}$$

$$m(\mathbf{y}) = \frac{2a\lambda^2 y e^{-\left(\lambda \mathbf{y}\right)^2} \left(\mathbf{1} - e^{-\left(\lambda \mathbf{y}\right)^2}\right)^{a-1}}{\mathbf{1} - \left(\mathbf{1} - e^{-\left(\lambda \mathbf{y}\right)^2}\right)^a}; y > 0, \lambda > 0, a > 0 \tag{12}$$

$$M(\boldsymbol{y}) = -\log\left[\mathbf{1} - \left(\mathbf{1} - e^{-\left(\lambda \boldsymbol{y}\right)^{2}}\right)^{a}\right]; \boldsymbol{y} > \mathbf{0}, \lambda > \mathbf{0}, a > \mathbf{0} \tag{13}$$

where *α* and *λ* stands for shape and scale parameters respectively of the distribution. It has also some attractive properties increasing hazard and bathtub type depends on the parameter value.

The second baseline distribution considered here is exponential power distribution. A continuous random variable *Y* is said to follow the exponential power distribution if its survival function, hazard function and cumulative hazard function are, respectively,

$$S(y) = e^{1 - e^{\psi^a}} ; y > 0, \lambda > 0, a > 0 \tag{14}$$

$$m(\mathbf{y}) = a\lambda \mathbf{y}^{a-1} e^{\lambda \mathbf{y}^a}; \mathbf{y} > \mathbf{0}, \lambda > \mathbf{0}, a > \mathbf{0} \tag{15}$$

$$M(y) = e^{\lambda y^a} - \mathbf{1} \tag{16}$$

where *λ* and *α* are the shape and scale parameters of the exponential power distribution. The hazard function and cumulative hazard function are respectively,

$$m(\mathbf{y}) = a\lambda \mathbf{y}^{a-1} e^{\lambda \mathbf{y}^a}; \mathbf{y} > \mathbf{0}, \lambda > \mathbf{0}, a > \mathbf{0} \tag{17}$$

$$\mathbf{M}(y) = e^{\lambda^{y^a}} - \mathbf{1} \tag{18}$$

The hazard function is decreasing function at time *y* when *α*<1 for smaller values of *λ* but as *λ* increases hazard function takes *U* shape curve and further increment in *λ* gives increasing nature to hazard function.

#### **4. Proposed models**

The unconditional survival function is obtained by replacing the cumulative hazard functions of generalized Rayleigh distribution and exponential power distribution in Eqs. (9) and (10). Then,

$$\begin{aligned} S(\boldsymbol{y}\_{1k}, \boldsymbol{y}\_{2k}) &= e^{-\left( \left( -\log \left[ 1 - \left( 1 - e^{-\left( \boldsymbol{\lambda}\_{2k} \right)^2} \right)^{\alpha\_1} \right] \right) + \left( -\log \left[ 1 - \left( 1 - e^{-\left( \boldsymbol{\lambda}\_{2k} \right)^2} \right)^{\alpha\_2} \right] \right) \right)} \right)} \end{aligned} \tag{19}$$

$$\left[ \mathbf{1} + \xi \left( \left( -\log \left[ 1 - \left( 1 - e^{-\left( \boldsymbol{\lambda}\_{\mathrm{Va}} \right)^2} \right)^{\alpha\_1} \right] \right) + \left( -\log \left[ 1 - \left( 1 - e^{-\left( \boldsymbol{\lambda}\_{\mathrm{Va}} \right)^2} \right)^{\alpha\_2} \right] \right) \right) \right) \right]^{-1/\xi} \tag{10}$$

$$\left(y\_{1k}, y\_{2k}\right) = e^{-\left(\frac{\sigma\_1}{\lambda\_1} \left(e^{\lambda\_{1^2k}} - 1\right) + \frac{\sigma\_2}{\lambda\_2} \left(e^{\lambda\_{2^2k}} - 1\right)\right)\eta\_k} \tag{20}$$

$$\left[\mathbf{1} + \xi \left(\frac{\alpha\_1}{\lambda\_1} \left(e^{\lambda\_1 y\_{1k}} - \mathbf{1}\right) + \frac{\alpha\_2}{\lambda\_2} \left(e^{\lambda\_2 y\_{2k}} - \mathbf{1}\right)\right)\right]^{-\gamma}$$

$$\left(\left(\frac{\alpha\_1}{\lambda\_1} \left(e^{\lambda\_1 y\_{1k}} - \mathbf{1}\right) + \frac{\alpha\_2}{\lambda\_2} \left(e^{\lambda\_2 y\_{2k}} - \mathbf{1}\right)\right)\right)^{-\gamma}$$

$$\mathcal{S}(\boldsymbol{y}\_{1k}, \boldsymbol{y}\_{2k}) = \boldsymbol{e}^{-\left(\left(-\log\left[1-\left(1-\boldsymbol{\epsilon}^{-\left(\boldsymbol{\lambda}\_{1}p\_{1k}\right)^{2}}\right)^{\sigma\_{1}}\right]\right)+\left(-\log\left[1-\left(1-\boldsymbol{\epsilon}^{-\left(\boldsymbol{\lambda}\_{2}p\_{2k}\right)^{2}}\right)^{\sigma\_{2}}\right]\right)\right)}\right)\eta\_{k} \tag{21}$$

$$S(\mathcal{Y}\_{1k}, \mathcal{Y}\_{2k}) = e^{-\left(\frac{a\_1}{\lambda\_1} (e^{\lambda\_1 \eta\_k} - 1) + \frac{a\_2}{\lambda\_2} (e^{\lambda\_2 \eta\_k} - 1)\right) \eta\_k} \tag{22}$$

$$I(\underline{\Psi}, \beta, \xi) = \prod\_{k=1}^{n\_1} f\_1(y\_{1k}, y\_{2k}) \prod\_{k=1}^{n\_2} f\_2(y\_{1k}, d\_{2k}) \prod\_{k=1}^{n\_3} f\_3(d\_{1k}, y\_{2k}) \prod\_{k=1}^{n\_4} f\_4(d\_{1k}, d\_{2k}) \tag{23}$$

$$I(\underline{\Psi}, \beta) = \prod\_{k=1}^{n\_1} f\_1(y\_{1k}, y\_{2k}) \prod\_{k=1}^{n\_2} f\_2(y\_{1k}, d\_{2k}) \prod\_{k=1}^{n\_3} f\_3(d\_{1k}, y\_{2k}) \prod\_{k=1}^{n\_4} f\_4(d\_{1k}, d\_{2k}) \tag{24}$$

$$\begin{aligned} f\_1(\boldsymbol{\nu}\_{1k}, \boldsymbol{\nu}\_{2k}) &= \frac{\partial^2 \mathcal{S}(\boldsymbol{\nu}\_{1k}, \boldsymbol{\nu}\_{2k})}{\partial \boldsymbol{\nu}\_{1k} \partial \boldsymbol{\nu}\_{2k}} \\ f\_2(\boldsymbol{\nu}\_{1k}, d\_{2k}) &= -\frac{\partial \mathcal{S}(\boldsymbol{\nu}\_{1k}, d\_{2k})}{\partial \boldsymbol{\nu}\_{1k}} \\ f\_3(d\_{1k}, \boldsymbol{\nu}\_{2k}) &= -\frac{\partial \mathcal{S}(d\_{1k}, \boldsymbol{\nu}\_{2k})}{\partial \boldsymbol{\nu}\_{2k}} \\ f\_4(d\_{1k}, d\_{2k}) &= \mathcal{S}(d\_{1k}, d\_{2k}) \end{aligned} \tag{25}$$

Putting Eq. (24) in Eqs. (23) and (24), we get the likelihood functions for the Lindley shared frailty models under generalized Rayleigh and exponential power baseline distributions and likelihood function for without frailty models under the same baseline distributions.

The joint posterior density of the parameters given failure times is given as

$$\begin{aligned} \mathfrak{a}(\alpha\_1, \lambda\_1, \alpha\_2, \lambda\_2, \xi, \beta) &\lhd L(\alpha\_1, \lambda\_1, \alpha\_2, \lambda\_2, \xi, \beta) \\ &\times \mathfrak{g}\_1(\alpha\_1) \mathfrak{g}\_2(\lambda\_1) \mathfrak{g}\_3(\alpha\_2) \mathfrak{g}\_4(\lambda\_2) \mathfrak{g}\_5(\xi) \prod\_{i=1}^{\mathfrak{g}} p\_i(\beta\_i) \end{aligned}$$

where *gi* ð Þ*:* (*i* ¼ 1, 2, ⋯, 5) represent the prior density function of baseline parameters and frailty variance, which are suppose to have known hyper parameters; *pi* ð Þ*:* represents prior density function for the regression coefficient *βi*; *β<sup>i</sup>* represents regression coefficients of vector form except *βi*, *i* ¼ 1, 2, … , *a* and likelihood function *I*ð Þ*:* is also presented by Eqs. (23) and (24). It is assumed that all of the parameters are distributed independently in this case.

The expression of the likelihood function in Eqs. (23) and (24) are not easy to solve by using the Newton–Raphson method. MLEs fail to converge as it involved a large number of parameters. As a result, the Bayesian approach was used to estimate the parameters involved in the models, which is free of such issues.

Prior distributions are utilized as follows: for a frailty parameter with a small value of Ψ, a gamma distribution with mean 1 and big variance Γ Ψð Þ , Ψ is used as a prior distribution. As a prior for the regression coefficient, say *φ*2, a normal distribution with mean zero and huge variance is utilized. Because we do not know anything about the baseline parameters, we use the same type of prior distributions used by Ibrahim et al. [14] and Sahu et al. [15], as well as a non-informative prior. As non-informative prior distributions, Γð Þ *a*1, *b*<sup>1</sup> and *U a*ð Þ 2, *b*<sup>2</sup> are utilized. All the hyper-parameters Ψ, *φ*, *a*1, *a*2, *b*<sup>1</sup> and *b*<sup>2</sup> are supposed to be known in advanced. Here Γð Þ *a*1, *b*<sup>1</sup> stands for gamma distribution having shape parameter *a*<sup>1</sup> and scale parameter *b*<sup>1</sup> and *U a*ð Þ 2, *b*<sup>2</sup> stands for the uniform distribution over the interval *a*<sup>2</sup> to *<sup>b</sup>*2. We provide the hyper-parameters as <sup>Ψ</sup> <sup>¼</sup> <sup>0</sup>*:*0001, *<sup>φ</sup>*<sup>2</sup> <sup>¼</sup> 1000, *<sup>a</sup>*<sup>1</sup> <sup>¼</sup> 1, *<sup>b</sup>*<sup>1</sup> <sup>¼</sup> 0*:*0001, *a*<sup>2</sup> ¼ 0, and *b*<sup>2</sup> ¼ 100.

The Metropolis Hasting Algorithm and Gibbs Sampler were used to estimate the parameters in the models fitted with the preceding prior density function and likelihood Eqs. (23) and (24), Metropolis Hasting Algorithm and Gibbs Sampler was utilized. Geweke test and Gelman-Rubin statistics, as suggested by Geweke [16] and Gelman et al. [17], show that the Markov chain converges to a stationary distribution. We used trace plots, coupling from the past plots, and sample autocorrelation plots to examine the chain's behavior, as well as to determine the burn-in period and autocorrelation lag.

It is important to decide which model provides the best fit to the dataset, the comparison of models was done using Akaike Information Criteria (AIC), Bayesian Information Criteria (BIC), Deviance Information Criteria (DIC) and Bayes factor.

#### **6. Application in real life data**

The models' applicability was tested by applying them to infectious illness data relating to kidney infection that occurred during catheter implantation [18]. It includes 38 patients' first and second recurrence times of infection from catheters used with portable dialysis equipment. For each patient in a cluster, these two times of infection are clustered together. Other pertinent data includes infection duration, *Modeling Heterogeneity Using Lindley Distribution DOI: http://dx.doi.org/10.5772/intechopen.100340*

patient age, gender (0 for male and 1 for female), and illness kinds such as Glomerulo Nephritis (GN), Acute Nephritis (AN), and Polycystic Kidney Disease (PKD).

To begin, the Kolmogorov–Smirnov test is used to determine the goodness of fit for kidney infection data, and the p-values obtained for the first and second recurrences are large enough to rule out the hypothesis that the first and second recurrence times follow the distributions with survival functions as given in Eqs. (11) and (14) in univariate case and it is also assumed to be appropriate for bivariate case. The corresponding p-values are given in **Table 1**. The posterior summary of the proposed models are presented in **Tables 2** and **3**. It consists of estimate (posterior mean), standard error, 95% lower and upper credible limits, GR statistics values with p-values and Geweke test values. From **Tables 2** and **3**, It is observed that we can observe that regression coefficients for all the models are more or less same. Also for all these proposed models, the value zero is not a credible value for all the credible intervals of the regression coefficients, so all the covariates are seems to be significant. To test the models' accuracy, we created 95% and 50% predictive intervals from a generated random sample based on a predictive distribution as described by Sahu et al. [15], and counted the total number of actual recurrence times for first and second kidney infections that fell within the intervals. The 95 percent and 50 percent predictive intervals are contained in the 95% and 50% predictive intervals for Models I and II, respectively, 76, 58, and 76, 60 out of 76


#### **Table 1.**

*p-Values of K-S Statistics for goodness of fit test for kidney infection data set.*


#### **Table 2.**

*Posterior results with baseline generalized Rayleigh distribution.*


#### **Table 3.**

*Posterior results with baseline exponential power distribution.*

observations. This demonstrates that the two models are appropriate for the data. Model-I is a better model in terms of AIC, BIC, and DIC values, since it has lower AIC, BIC, and DIC values than Model-II in **Table 4**. However, because the difference between AIC, BIC, and DIC values for Model I and Model II is so little, AIC, BIC, and DIC values are not suitable for deciding between the two models. To compare model *u* with model *v*, we use the Bayes factor (**Table 5**). Model-I is better than Model-II, since the equivalent value of 2 *log B*ð Þ *uv* is larger than 10, suggesting that there is a very strong positive to favor Model-I over Model-II for the provided dataset, confirming our earlier conclusion in **Table 4**. As a result of all of the demonstrated comparison criteria, we can conclude that Model-I is superior to Model-II in terms of modeling kidney infection data.


#### **Table 4.**

*AIC, BIC and DIC values for all models.*


#### **Table 5.**

*Bayes factor values and decision for test of significance for frailty fitted to kidney infection data set.*

### **7. Discussion**

In this study, we examined a new Lindley shared frailty model under generalized Rayleigh and exponential power as baseline distributions.

To suit all of the proposed models, the Metropolis-Hastings and Gibbs sampler was used. The proposed models were used to assess kidney infection data, and the best model was suggested. To conduct the analysis, we used self-composed programs in the R statistical software.

All of the exhibited comparison criteria indicated that the Lindley shared frailty model with generalized Rayleigh baseline distribution is superior to exponential power baseline distribution and without frailty models for modeling kidney infection data under the identical baseline distributions. The estimates of frailty variance are 0.9415 and 0.9739, which are high in all the proposed models indicating that there is a strong evidence of a high degree of heterogeneity among the patients in the population. A few patients are anticipated to be exceptionally inclined to infection compared to others with the same covariate values. Some patients are expected to be very prone to infection compared to others with the same covariate values. Also we can say that there is a strong positive correlation between the two infection times for the same patient.

The most important properties of the proposed models that were not mentioned in the previous study are the estimates of the frailty variances are high in all proposed models as compared to previous study given by McGilchrist and Aisbett [18] on log-normal frailty, Hanagal and Bhambure [19], the disease type GN and AN has lower infection rates as compared to other covariates. All the covariates are significant factors for kidney infection, but the disease type are insignificant in the previous proposed frailty models (see [4]). It is very crucial to be mention that Lindly shared frailty model based on generalized Rayleigh baseline distribution is performed better to analyze kidney infection data than other frailty models [4, 19].

## **Author details**

Arvind Pandey<sup>1</sup> and Lalpawimawha<sup>2</sup> \*


\*Address all correspondence to: raltelalpawimawha08@gmail.com

© 2021 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## **References**

[1] Vaupel, J. W., Manton, K. G., Stallard, E. 1979. The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography, 16(3), 439-454.

[2] Gupta, R.D., and Kundu, D. (2001). Exponentiated Exponential Family: An Alternative to Gamma and Weibull Distributions, Biometrical Journal,43 (1), 117–130.

[3] Kheiri, S., A. Kimber, and M.R. Meshkani. 2007. Bayesian analysis of an inverse Gaussian correlated frailty model. Computational Statistics and Data Analysis 51: 5317-5326.

[4] Hanagal, D.D., Dabade, A.D. 2013. Compound negative binomial shared frailty models for bivariate survival data. Statistics and Probability Letters, 83, 2507-2515.

[5] Pandey, A., Bhushan, S., Lalpawimawha, R. 2018. Shared frailty models with baseline generalized Pareto distribution, Communications in Statistics-Theory and Methods, DOI: 10.1080/03610926.2018.1500597

[6] Pandey, A., Bhushan, S., Lalpawimawha, R., Misra, P.K. 2019. Comparison of additive shared frailty models under Lindley baseline distribution, Communications in Statistics-Simulation and Computation, DOI: 10.1080/ 03610918.2019.1664573

[7] Lindley, D.V. 1958. Fiducial distributions and Bayes' theorem. Journal of the Royal Statistical Society, Series B, 20, 102-107.

[8] Ghitany, M.E., Atieh, B., Nadarajah, S. 2008. Lindley Distribution and Its Applications. Mathematics and Computers in Simulation, 78(4), 2008, 493-506.

[9] Ghitany, M.E., Alqallaf, F., Al-Mutairi, D.K., Hussain, H. 2011. A Two Parameter Weighted Lindley Distribu-tion and Its Applications to Survival Data. Mathematics and Computers in Simulation, 81(6), 1190-1201.

[10] Bakouch, H.S., Al-Zahrani, B.M., Al-Sho, A.A., Marchi, A.A., Louzada, F. 2012. An Extended Lindley Distribution. Journal of the Korean Statistical Society, 41(1), 75-85.

[11] Ghitany, M., Al-Mutairi, D., Balakrishnan, N. and Al-Enezi, I., 2013. Power Lindley distribution and associated inference. Computational Statistics and Data Analysis, 64, 20-33.

[12] Shanker, R., Sharma, S., Shanker, R. 2013. A Two-Parameter Lindley Distribution for Modeling Waiting and Survival Times Data. Applied Mathematics, 4, 363-368.

[13] Surles, J. G., Padgett, W.J. (2001). Inference for reliability and stressstrength for a scaled Burr Type X distribution. Lifetime Data Anal. 7,187-200.

[14] Ibrahim, J.G., Chen, Ming-Hui, Sinha, D., 2001. Bayesian Survival Analysis. Springer-Verlag.

[15] Sahu, S.K., D.K. Dey, H. Aslanidou, & D. Sinha. 1997. A Weibull regression model with gamma frailties for multivariate survival data. Life time data analysis, 3, 123-137.

[16] Geweke, J. 1992. "Evaluating the Accuracy of Sampling-Based Approaches to the Calculation of Posterior Moments." In Bayesian Statistics 4 (eds. J.M. Bernardo, J. Berger, A.P. Dawid and A.F.M. Smith), 169-193. Oxford: Oxford University Press.

*Modeling Heterogeneity Using Lindley Distribution DOI: http://dx.doi.org/10.5772/intechopen.100340*

[17] Gelman, A., & D.B. Rubin. 1992. A single series from the Gibbs sampler provides a false sense of security. In Bayesian Statistics 4 (J. M. Bernardo, J. 0.Berger, A. P. Dawid and A. F. M. Smith, eds.). Oxford: Oxford Univ. Press, 625-632.

[18] McGilchrist, C.A., & C.W. Aisbett. 1991. Regression with frailty in survival analysis. Biometrics, 47, 461-466.

[19] Hanagal, D.D., & Bhambure, S.M. (2016). Modeling bivariate survival data using shared inverse Gaussian frailty model. Communications in Statistics-Theory and Methods, 45(17), 4969-4987.

Section 3
