Appendix

In this appendix, we use two examples to illustrate how to implement smoothing spline ANOVA (SSANOVA) models in R. The gss package in R, which can be downloaded on the CRAN https://cran.r-project.org/, is utilized.

We now load the gss package:

library(gss)

Example I: Apply the smoothing spline to a simulated dataset.

Suppose that the predictor x follows a uniform distribution on 0½ � ; 1 , and the response y is generated based on y ¼ 5 þ 2 cos 3ð Þþ πx e, where e � Nð Þ 0; 1 .

x<-runif(100);y<-5+2\*cos(3\*pi\*x)+rnorm(x)

Then, fit cubic smoothing spline model:

cubic.fit<-ssanova(y˜ x)

To evaluate the predicted values, one uses:

new<-data.frame(x=seq(min(x),max(x),len=50))

est<-predict(cubic.fit,new,se=TRUE)

The se.fit parameter indicates if one can get the pointwise standard errors for the predicted values. The predicted values and Bayesian confidence interval, shown in Figure 4, are generated by:

```
plot(x,y,col=1)
lines(new$x,est$fit,col=2)
lines(new$x,est$fit+1.96*est$se,col=3)
lines(new$x,est$fit-1.96*est$se,col=3)
```
Example II: Apply the SSANOVA model to a real dataset.

In this example, we illustrate how to implement the SSANOVA model using the gss package. The data is from an experiment in which a single-cylinder engine is run with ethanol to see how the nox concentration nox in the exhaust depends on the compression ratio comp and the equivalence ratio equi. The fitted model contains two predictors (comp and equi) and one interaction term.

```
data(nox)
```
Section 3, we discussed the models under the big data settings. When the volume of data grows, fitting the models is computing-intensive [11]. The adaptive basis selection algorithm [14] and rounding algorithm [15] we presented can significantly reduce the computational cost.

This work is partially supported by the NIH grants R01 GM122080 and R01 GM113242; NSF

The authors whose names are listed immediately below certify that they have NO affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers' bureaus; membership, employment, consultancies, stock ownership, or other equity interests; and expert testimony or patent-licensing arrangements) or nonfinancial interest (such as personal or professional relationships, affiliations, knowledge, or beliefs) in the subject matter or materials discussed in this manuscript.

In this appendix, we use two examples to illustrate how to implement smoothing spline ANOVA (SSANOVA) models in R. The gss package in R, which can be downloaded on the

Suppose that the predictor x follows a uniform distribution on 0½ � ; 1 , and the response y is

grants DMS-1222718, DMS-1438957, and DMS-1228288; and NSFC grant 71331005.

Acknowledgements

78 Topics in Splines and Applications

Conflict of interest

Appendix

library(gss)

CRAN https://cran.r-project.org/, is utilized.

Then, fit cubic smoothing spline model:

To evaluate the predicted values, one uses:

est<-predict(cubic.fit,new,se=TRUE)

cubic.fit<-ssanova(y˜

Example I: Apply the smoothing spline to a simulated dataset.

generated based on y ¼ 5 þ 2 cos 3ð Þþ πx e, where e � Nð Þ 0; 1 .

x<-runif(100);y<-5+2\*cos(3\*pi\*x)+rnorm(x)

x)

new<-data.frame(x=seq(min(x),max(x),len=50))

We now load the gss package:

```
nox.fit <- ssanova(log10(nox)˜
                              comp*equi,data=nox)
```
The predicted values are shown in Figure 5.

```
x=seq(min(nox$comp),max(nox$comp),len=50)
```
y=seq(min(nox\$equi),max(nox\$equi),len=50)

temp <- function(x, y){

Figure 4. The solid red line represents the fitted values. The green lines represent the 95% Bayesian confidence interval. The raw data are shown as the circles.

[3] Friedman JH, Grosse E, Stuetzle W. Multidimensional additive spline approximation.

Smoothing Spline ANOVA Models and their Applications in Complex and Massive Datasets

http://dx.doi.org/10.5772/intechopen.75861

81

[4] Hastie TJ. Generalized additive models. In: Statistical Models in S. Routledge; 2017. pp.

[5] Stone CJ. Additive regression and other nonparametric models. The Annals of Statistics.

[6] Stone CJ. The dimensionality reduction principle for generalized additive models. The

[7] Barry D et al. Nonparametric bayesian regression. The Annals of Statistics. 1986;14(3):934-

[9] Gu C, Wahba G. Minimizing GCV/GML scores with multiple smoothing parameters via the Newton method. SIAM Journal on Scientific and Statistical Computing. 1991;12(2):383-398

[10] Wahba G. Partial and Interaction Splines for the Semiparametric Estimation of Functions

[11] Gu C, Smoothing Spline ANOVA. Models, Volume 297. In: Springer Science & Business

[14] Ma P, Huang JZ, Zhang N. Efficient computation of smoothing splines via adaptive basis

[15] Helwig NE, Ma P. Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters. Statistics and Its Interface, Special Issue on Statistical and

[16] Green PJ, Silverman BW. Nonparametric Regression and Generalized Linear Models: A

[17] Kimeldorf GS, Wahba G. A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. The Annals of Mathematical Statistics. 1970;41(2):

[18] Kimeldorf GS, Wahba G. Spline functions and stochastic processes. Sankhya: The Indian

[19] O'sullivan F, Yandell BS, Raynor WJ Jr. Automatic smoothing of regression functions in generalized linear models. Journal of the American Statistical Association. 1986;81(393):96-103

[20] Wahba G, Wang Y, Gu C, Klein R, Klein B. Smoothing spline ANOVA for exponential families, with application to the Wisconsin epidemiological study of diabetic retinopathy.

of Several Variables. University of Wisconsin, Department of Statistics; 1986

[12] Wahba G. Spline Models for Observational Data. SIAM; 1990

sampling. Biometrika. 2015;102(3):631-645

Roughness Penalty Approach. CRC Press; 1993

Journal of Statistics, Series A; 1970. pp. 173-180

The Annals of Statistics; 1995:1865-1895

[13] Wang Y. Smoothing Splines: Methods and Applications. CRC Press; 2011

Computational Theory and Methodology for Big Data. 2016;9:433-444

SIAM Journal on Scientific and Statistical Computing. 1983;4(2):291-301

[8] Chen Z. Interaction Spline Models. University of Wisconsin–Madison; 1989

249-307

953

Media; 2013

495-502

1985;13:689-705

Annals of Statistics. 1986;14:590-606

Figure 5. The x-axis, y-axis, and z-axis represent the compression ratio, the equivalence ratio, and the predicted values, respectively.

```
new=data.frame(comp=x,equi=y)
return(predict(nox.fit,new,se=FALSE))
}
z=outer(x, y, temp)
persp(x,y,z,theta = 30).
```