**2.1 Copula formulation**

In our study, the copula models aim to capture the dependence between the observed/recorded nurses' rating and the automated sedation dose for each patient. We test for and utilise the so-called best fitting copula found. This section acts as a guide to decide if there exists a tail relationship between the nurses' rating of patient agitation A-S score and automated sedation dose.

Analytically and contextually, the lower tail region corresponds to the patient specific *mild agitation range* and the upper tail region corresponds to the *severe agitation range*, with the non-tail middle region capturing the patient's *moderate agitation range*. Clearly patient's transition between these so-called mild, moderate and severe ranges or states over time in ICU. In our context, if the two distributions that is of nurses' (observed) rating and that of the automated sedation dose are independent univariate Gaussians, then we can define the multivariate Gaussian distribution as the best fit. Let X and Y be independent Gaussian (with arbitrary means and variances), then *Zj* ¼ *aj***1***X* þ *aj***2***Y* is univariate Gaussian for j = 1, 2, … , n and *aj***1**, *aj***<sup>2</sup>** are real constants, and Z is multivariate Gaussian.

$$\mathbf{Let } \mathbf{Z} = \begin{pmatrix} \mathbf{Z\_1} \\ \vdots \\ \mathbf{Z\_n} \end{pmatrix} = \mathbf{A} \begin{pmatrix} \mathbf{X} \\ \mathbf{Y} \end{pmatrix} \text{ be constructed based on the } n \times n \text{ matrix } A \text{ where } \mathbf{X} = \mathbf{A} \mathbf{Z} \text{ and } \mathbf{Z} = \mathbf{A} \mathbf{Z} \text{ are the matrix } \mathbf{Z} = \mathbf{A} \mathbf{Z} \text{ and } \mathbf{Z} = \mathbf{A} \mathbf{Z} \text{ are the matrix } \mathbf{Z} = \mathbf{A} \mathbf{Z} \text{ and } \mathbf{Z} = \mathbf{A} \mathbf{Z} \text{ are the vector } \mathbf{Z} \text{ of } \mathbf{Z} \text{ and } \mathbf{Z} = \mathbf{A} \mathbf{Z} \text{ are the vector } \mathbf{Z} \text{ of } \mathbf{Z} \text{ and } \mathbf{Z} = \mathbf{A} \mathbf{Z} \text{ are the vector } \mathbf{Z} \text{ of } \mathbf{Z} \text{ and } \mathbf{Z} = \mathbf{A} \mathbf{Z} \text{ are the vector } \mathbf{Z} \text{ of } \mathbf{Z} \text{ and } \mathbf{Z} \text{ is the vector } \mathbf{Z} \text{ of } \mathbf{Z} \text{ in } \mathbf{Z} \text{ and } \mathbf{Z} \text{ is the vector } \mathbf{Z} \text{ of } \mathbf{Z} \text{ in } \mathbf{Z} \text{ as well.}$$

and Y.

are independent and identically distributed (i.i.d.) N(0,1) random variables. Then <sup>Z</sup> <sup>þ</sup> <sup>μ</sup> is multivariate Gaussian with mean vector <sup>μ</sup> and covariance matrix <sup>¼</sup> *AAT*. From the Central Limit Theorem, the Gaussian distribution arises as a limit of a scaled sum of weakly dependent random variables [27, 28].

Parametric copula families are conventionally constructed to satisfy different combinations of bivariate dependence structures with tail behaviours [28]. The general definition of Archimedean copulas is given by Boateng [29]. It is noteworthy that the Clayton, Gumbel, and Frank copulas are examples of existing Archimedean copulas. A discussion about the Clayton, Gumbel, and Frank copulas, and tail dependence of a bivariate copula, Kendall's tau representations, and copula models of the Clayton, Frank, and Gumbel copulas are also defined in [29]. The Clayton copula, for example, accommodates only lower tail dependence [30], the Frank copula allows dependence around the mode [31], and the Gumbel is relevant only when upper tail dependence exists [32]. The difference between the Clayton and Gumbel copulas is: (i) for the Clayton copula, correlations on the extreme left sides of distributions are more concentrated (i.e., higher correlations) than those in the extreme right sides of the distributions, and (ii) for the Gumbel copula, the correlations on the extreme right sides of distributions are more concentrated (i.e., higher correlations) than those in the extreme left sides of the distributions. The visuals in Section 3 of this chapter illustrate these trends. We refer the reader to Boeting [29] and below give the bivariate Gaussian formulation and bivariate Frank and Gumbel copula, as three examples.

#### **Bivariate Gaussian copula**

The copula cdf is:

$$\mathbf{C}(\mathfrak{u}, \mathfrak{v}, \boldsymbol{\rho}) = \boldsymbol{\Phi}\_2(\boldsymbol{\Phi}^{-1}(\mathfrak{u}), \boldsymbol{\Phi}^{-1}(\mathfrak{v}); \boldsymbol{\rho}), \mathbf{0} < \mathfrak{u}, \mathfrak{v} < \mathbf{1}. \tag{1}$$

#### **Bivariate Frank**

For �∞ < *δ*< ∞, the copula cdf is:

$$\mathbf{C}(\mathfrak{u}, \mathfrak{v}, \delta) = -\delta^{-1} \log \left( \frac{\mathbf{1} - e^{-\delta} - \left( \mathbf{1} - e^{-\delta u} \right) \left( \mathbf{1} - e^{-\delta v} \right)}{\mathbf{1} - e^{-\delta}} \right), \mathbf{0} < \mathfrak{u}, \mathfrak{v} < \mathbf{1}. \tag{2}$$

#### **Bivariate Gumbel copula**

The copula cdf is: *C u*ð Þ¼ , *<sup>υ</sup>*, *<sup>δ</sup> exp* � � **log** *<sup>u</sup>* � �*<sup>δ</sup>* þ � **log** *<sup>υ</sup>* � �*<sup>δ</sup>* � �**<sup>1</sup>***=<sup>δ</sup>* � �**, 0**≤*u*, *υ*≤**1**, **1**≤*δ* ≤ ∞*:* Upper tail dependence function for Gumbel copula is:

$$b\_U(w\_1 w\_2; \delta) = w\_1 + w\_2 - \left(w\_1^\delta + w\_2^\delta\right)^{1/\delta}.\tag{3}$$

#### **2.2 Kendall K-plot construction**

The best fitting copula was selected by maximum likelihood estimation, except for the t-copula, for which the degrees of freedom parameter is found by a crude profile likelihood optimisation over the interval (2, 10]. We use the Kendall plot

*Copula Modelling of Agitation-Sedation (A-S) in ICU: Threshold Analysis of Nurses'… DOI: http://dx.doi.org/10.5772/intechopen.105753*

(K-plot) [33] to determine the bivariate patient-specific thresholds in the cases where the best fitting copula has tails. The K-plot splits the data into two regions with significantly different strengths of dependence between nurses' rating and the automated sedation dose, namely: (1) the main region with an approximately linear relationship; and (2) the tail regions with a non-linear relationship. Recently the K-plot has gained popularity with regard to its association with the receiver operating characteristic (ROC) curve, a pivotal biostatistical graphical tool traditionally used for testing the ability of biomarkers to discriminate between populations [34].

The K-plot adopts the familiar probability plot (Q-Q plot) to detect dependence. A lack of linearity of the standard Q-Q plot is an indication of nonnormality of the distribution of a random variable. Similarly, in the absence of association between two variables, the K-plot is close to a straight line, while the amount of curvature in the K-plot is characteristic of the degree of dependence in the data, and is related, in a definite way, to the underlying copula. This method is closely related to Kendall's tau statistic [35] from which it takes the name. For more details refer to [27, 33].

To construct a K-plot, we need to compute *Hi* defined for a given pair (*Xi*, *Yi*Þ with 1≤*i* ≤*n* as follows: *Hi* ¼ # *j* 6¼ *i* : *Xj* ≤ *Xi*, *Yj* ≤*Yi* � �*=*ð Þ *<sup>n</sup>* � <sup>1</sup> . Next, we need to order the variable *Hi*, *H*ð Þ<sup>1</sup> ≤ … ≤ *H*ð Þ *<sup>n</sup>* and plot the pairs *W*<sup>1</sup>:*<sup>n</sup>*, *H*ð Þ*<sup>i</sup>* � �, 1≤*i* ≤*n*, where W\_(1,n) is the expectation of the ith order statistic in a random sample of size n from the distribution *K*<sup>0</sup> of the *Hi* under the null hypothesis of independence. Using the definition of the density of an order statistic, we define the form of *K*<sup>0</sup> under the null hypothesis of the independence, as follows:

$$W\_{1:n} = n \binom{n-1}{i-1} \int\_0^1 a \left\{ K\_0(o) \right\}^{i-1} \times \left\{ 1 - K\_0(o) \right\}^{n-1} dK\_0(o), 1 \le i \le n. \tag{4}$$

### **2.3 Multiple threshold identification via dynamic programming algorithm**

To identify a patient-specific threshold, we apply the dynamic programming algorithm discussed in Section 3.3 of [23] to use the dependence measure *Hi*. This algorithm captures multiple thresholds; however, to be consistent with the objective of this paper, we focus on determining the lower and upper tail thresholds. In our study, the lower tail threshold corresponds to the lowest (lower) threshold and the upper tail threshold corresponds to the highest (upper) threshold, respectively for either dose or the nurses'score profiles.

## **2.4 Prediction equation of nurses' score with respect to dose allowing for tails**

Below we detail, as an illustration to the novel method that accommodates lower and upper thresholds beyond conventional correlational analysis using Kendall tau and copulas, the resultant equations specific to two patients, Patient 20 and Patient 8, which are both good trackers. P20 has no tails and P8 a lower tail. All the patients' equations and their details are tabulated in the Appendix A.

The general formulation of the equation relating each patient's nurses'score to the automated infusion dose is given by either of the following expressions:

```
• Score ¼ intercept þ α∗ Dose–β Dose ∗ LT þ γ Dose ∗ UT
```
• Score <sup>¼</sup> intercept <sup>þ</sup> slopeðDoseÞ–ð Þ slope : Lower region Dose þð Þ slope dose : Upper region *:*

Patient 20: The recorded or so-called nurses' observed A-S score and the automated sedation dose for patient 20 are independent univariate Gaussians, therefore, their joint distribution is Gaussian (**Tables 4** and **5** and LHS of **Figure 5**). This gives a bivariate Gaussian copula, with neither lower nor upper tails (RHS of **Figure 5**). Thus, the relationship between P20's nurses'score and infusion dose is estimated using a simple linear regression (SLR) equation, score = intercept + α\*dose, with intercept and slope parameters � 0.31 and α = 1.16, respectively (**Table 6** and RHS of **Figure 5**).

Patient 8: In contrast to patient 20, for patient 8 (good tracker) the nurses' recorded score and the automated sedation dose are skewed distributions (**Figure 6**), with the joint distribution being Survival Gumbel with a lower tail dependence. (lower tail *tau* = 0.70)


#### **Table 4.**

*Patient 20 equation components and p-values, good tracker, no tails.*


#### **Table 5.**

*Patient 8 (P8) equation components - good tracker, lower tail* tau *= 0.70.*

#### **Figure 5.**

*Bivariate plots (LHS), copula type, and relevant tau (RHS) relating P 20's nurses' A-S score with dose, good tracker, no tails. SLR line is not shown.*

*Copula Modelling of Agitation-Sedation (A-S) in ICU: Threshold Analysis of Nurses'… DOI: http://dx.doi.org/10.5772/intechopen.105753*


#### **Table 6.**

*List of copulas selected as optimal for the 36 patients (13 poor trackers). Shaded rows indicate the poor trackers. PX denotes patient number X.*

(see **Table 7** and **Figure 6**, top panel). Hence, the relationship between P8's nurses'score and the dose is estimated by our novel prediction equation, as follows.

Score = 1.01Dose-0.29Dose\*LT: The corresponding slope and lower tail parameters are α = 1.01 and β = 0.29 (**Table 7**). The highly significant slope of 1.01 (p < 0.00001) indicates that in this patient's moderate agitation zone nurses tend to estimate agitation severity quite accurately. However, in the mild agitation zone, nurses tend to assign a rating that is, on average, 0.29 points lower than expected for the patient's given (by automated dose) agitation severity (p = 0.0045) (RHS of bottom panel of **Figure 6**). There are 33 (out of a total of 127) such occurrences indicating that in approximately one in every four ratings, nurses tend to underestimate this patients' agitation severity, LT dose threshold = 3.02 and LT score threshold = 2.57.

The Kendall-plot is then used to identify the lower tail threshold which occurs at the 26th percentile in the patient's bivariate (dose, score) trajectory (**Table A.1** Appendix), estimated by the algorithm of Bai and Perron [23]. For patient 8, the LT infusion dose threshold is 3.02 and LT nurse's score threshold is 2.57 (see percentiles in **Table A.1**) and the full set of equations in **Table A.2** in Appendix.
