**3. Results**

6 Technical Problems in Patients on Hemodialysis

<sup>1</sup> ' ' '' 2 *N NN ij iii i i i i i i ij i ii <sup>j</sup> y x x*

' ; 0, 1..

*i i i i i i N*

 

*x y i N y x i N*

 *x*

 

 

 

subject to <sup>1</sup> ' 0

 

 0

*ii i i ii i i*

 

**β**

Then the link between the dual and primal representation is given by

0

1 ˆ '

In our application, for the SVM case, both input and output training data where centered and scaled to have zero means and unity standard deviation. The values of the **β** coefficients

column of **X** and **X** is the vector of columns means from **X.** The mean and standard deviation of Ueq from training data set are *Y* and *sd***<sup>y</sup>** , respectively. The intercept is

The three estimation procedures (OLS, PLS, and SVM) to obtain the regression coefficients **β** of a linear model where applied to build bed side equations to estimate equilibrated urea from intradialysis urea samples and anthropometric data in 109 hemodialyzed patients. Estimation, selection and validation of the model were implemented in R language (www.rproject.org) (see appendix).Prior to fit a model, the appropriate number of factors (A) ,the best cost (*C*) and epsilon (*ε*) pairs values were chosen for PLS and SVM, respectively. For this purpose, a 15 fold cross validation strategy was applied over 70% randomly chosen patients from the data set. In the PLS case, models including 1 to *A* factors with *A=1, 2, 3, 4* and *5* were tested. For each model the cross validation root mean prediction error (RMPE) was calculated. Then the expected value of the RMPE over all partitions was obtained. The model achieving the smaller RMPE mean was chosen. For the linear SVM case, a *C*x*ε* 10x10 grid searches was performed. The ranges were from *4* to *6* for *C* and from *0.001* to *2* for *ε.* A linear SVM model was built for each (*C*,*ε*) pairs and the cross validation RMPE was calculated and compared. The smaller RMPE mean was used as selection criteria. The

<sup>0</sup> ˆˆ ˆ <sup>ˆ</sup> *sd SVM raw sd SVM raw <sup>Y</sup>*

 **<sup>β</sup>**

*N SVM i i i i*

 

*N*

 

 

*i i* 

0, 1... ' 0, 1...

*SVM raw* **Y V y y <sup>β</sup> X V <sup>β</sup> <sup>X</sup> <sup>β</sup> <sup>X</sup>** (13)

**Y** is the estimated Ueq, **V** is a diagonal matrix of standard deviations for each

 

(12)

and 0 *ij*

*C* 

*C* 

for *p=1*. The

Maximize 1 1, <sup>1</sup>

are Lagrangean multipliers satisfying ' 0

following Karush-Kuhn-Tucker conditions should also be satisfied

**β**

*i i* (Cristianini and Shawe-Taylor,2000).

^ 1 1

in the raw data domain were calculated as follows:

*sd SVM raw <sup>Y</sup>* **yV <sup>β</sup> <sup>X</sup>** .

 

 

where

, ' 

where ,' 0 

where ^

expressed as <sup>1</sup>

<sup>0</sup> <sup>ˆ</sup>

**2.5 Statistical modeling of equilibrated urea** 

In table 2, cross validation statistics for PLS models with different number of factors is shown. Table 2 summarizes mean and standard deviation of Mean Prediction Error (RMPE) and mean and standard deviation of correlations between estimated and measured Ueq (*R*). It is possible to see that a PLS model with 3 or 4 components are very competitive. We chose a linear fit with 3 Factors because it yields the lowest RMPE with a parsimonious model


Table 2. Expected prediction error for PLS model with different number of factors.

In Fig.1 the achieved RMPE of the SVM models are shown for each *C*×*ε* grid point. The chosen *C*×*ε* pair was *C= 4.2222* and *ε= 0.2223* (filled circle in Fig.1)

Fig. 1. Cross-validation MSE for each *C*×*ε* combination in the SVMR algorithm. The best *C*×*ε* combination pair is indicated with a filled circle

Bedside Linear Regression Equations to Estimate Equilibrated Blood Urea 9

Table 3. Summary statistics for prediction errors and number of data points laying in the ±10

β2for U120, β3for U240, β4 for Bw0, and β5 for Uf) in the input scale (equation 8 for PLS and 13 for SVM) are shown. It is possible to see that coefficient β5 (associated to Uf) is very variable. This coefficient is mainly estimated as positive by OLS, negative by PLS case and both by SVM. In the first two cases, *β<sup>5</sup>* was statistically different from zero ("t test" p<0.01). SVM estimation of **β** seems to be more robust than the other cases. In particular, the *β* coefficient related to Uf (*β5*) shows significant less dispersion than in the other models. In the OLS and PLS cases, all except Uf coefficient, show similar behaviour. The Uf coefficient for PLS is the

**β** coefficients that weights each input variable (β1for U0,

**β** coefficients for each input variable from the 20-Fold cross-

OLS 0.08 9.59 -2.44 55.05% 85.32% PLS 0.06 9.60 -.255 55.96% 87.16% SVM 1.08 9.26 -1.72 63.30% 90.83%

and ±20 %PE interval

In Fig. 3 the distribution for the ˆ

most variant among the rest.

Fig. 3. Distribution of the ˆ

validation.

 Prediction Error Percentage of data points with %PE in the range Mean SD Median -10≤%PE≤10 -20≤%PE≤20

Once the PLS and SVM models where selected, *i.e*. a 3 PLS factor model and a SVM trained with *C= 4.2222* and *ε= 0.2223,* the 3 methods (OLS, PLSA=3 and SVMC=4.222,ε=0.2223) where evaluated over the whole data set with a 20-fold cross-validation strategy. In Fig. 2 the relative prediction error (%PE) vs. true equilibrated Urea and its corresponding smooth trend are shown for the three estimation strategies. In open circles the OLS (dashed smooth trend) approach, in \* PLS errors (dot-dashed smooth trend) and in "+" symbol the SVM errors (dotted smooth trend). It is possible to see that OLS and PLS performs almost equal with a small tendency to increased over estimation for PLS in high Ueq values (the PLS smooth trend curve shows greater %PE than in the other cases). On the contrary, SVM performs better for low Ueq (dotted smooth trend closer to zero %PE). In the midrange of Ueq the three methods performs similar. All the methods tend to overestimate small Ueq values and under estimate high Ueq values.

Fig. 2. 20-Fold cross-validation % prediction errors (%PE) for each tested model. Open circles for OLS model, "\*" for PLS and "+" for SVMR. The smooth trend curve for each model is also presented (see text for references)

In Table 3, summary statistics for PE and the number of data points which have a %PE in the ±10 and ±20 ranges is shown. The PLS model achieves the lowest %PE and SVM the highest but with lesser standard deviation across runs. In terms of median we can see that all the methods tend to overestimate the response, however SVM presents the lower median of %PE suggesting robustness to outliers.

Once the PLS and SVM models where selected, *i.e*. a 3 PLS factor model and a SVM trained with *C= 4.2222* and *ε= 0.2223,* the 3 methods (OLS, PLSA=3 and SVMC=4.222,ε=0.2223) where evaluated over the whole data set with a 20-fold cross-validation strategy. In Fig. 2 the relative prediction error (%PE) vs. true equilibrated Urea and its corresponding smooth trend are shown for the three estimation strategies. In open circles the OLS (dashed smooth trend) approach, in \* PLS errors (dot-dashed smooth trend) and in "+" symbol the SVM errors (dotted smooth trend). It is possible to see that OLS and PLS performs almost equal with a small tendency to increased over estimation for PLS in high Ueq values (the PLS smooth trend curve shows greater %PE than in the other cases). On the contrary, SVM performs better for low Ueq (dotted smooth trend closer to zero %PE). In the midrange of Ueq the three methods performs similar. All the methods tend to overestimate small Ueq

Fig. 2. 20-Fold cross-validation % prediction errors (%PE) for each tested model. Open circles for OLS model, "\*" for PLS and "+" for SVMR. The smooth trend curve for each

In Table 3, summary statistics for PE and the number of data points which have a %PE in the ±10 and ±20 ranges is shown. The PLS model achieves the lowest %PE and SVM the highest but with lesser standard deviation across runs. In terms of median we can see that all the methods tend to overestimate the response, however SVM presents the lower median of

values and under estimate high Ueq values.

model is also presented (see text for references)

%PE suggesting robustness to outliers.


Table 3. Summary statistics for prediction errors and number of data points laying in the ±10 and ±20 %PE interval

In Fig. 3 the distribution for the ˆ **β** coefficients that weights each input variable (β1for U0, β2for U120, β3for U240, β4 for Bw0, and β5 for Uf) in the input scale (equation 8 for PLS and 13 for SVM) are shown. It is possible to see that coefficient β5 (associated to Uf) is very variable. This coefficient is mainly estimated as positive by OLS, negative by PLS case and both by SVM. In the first two cases, *β<sup>5</sup>* was statistically different from zero ("t test" p<0.01). SVM estimation of **β** seems to be more robust than the other cases. In particular, the *β* coefficient related to Uf (*β5*) shows significant less dispersion than in the other models. In the OLS and PLS cases, all except Uf coefficient, show similar behaviour. The Uf coefficient for PLS is the most variant among the rest.

Fig. 3. Distribution of the ˆ **β** coefficients for each input variable from the 20-Fold crossvalidation.

Bedside Linear Regression Equations to Estimate Equilibrated Blood Urea 11

Fig. 4. Pairs plots and correlation coefficients between *U240, BW0 Uf, Ueq* and urea rebound. We showed that by means of linear models we were able to build bedside equations that can

All the presented methods performed better than traditional methods (Smye et al, 1999) over the same data (Fernández et al, 2001) suggesting the appropriateness of the simple linear approaches. In addition, each hemodialysis centre can build its own predictor based on its own patient population by following the described process or implementing the

In this work we show that the use of an intradialysis sample (U120) provided valuable information to predict the equilibrated urea. Smye et al. (1999) were the first to use an intradialysis sample to model Ueq. In clinical practice the extraction of an additional blood urea sample could be very problematic. In a recent publication (Fernandez et al, 2008) we showed that a linear model built without this urea sample can also provide accurate Ueq estimation. Future challenges for Ueq prediction by linear models are emerging with the implementation of different HD schedule proposals based on the variation of session time

**5. Appendix: R source code for OLS, PLS and SVM linear models for estimate** 

In order to apply the R (www.r-project.org) algorithm to build the linear models presented in this work, we assume that the patient data base is stored in a comma separated values

(CSV) file as follows (any electronic spreadsheet program allows to save CSV files).

be easily implemented in any calculator or electronic spreadsheet such as Excel®.

accompanying source code (see appendix).

and/or weekly frequency.

**equilibrated urea** 
