4. Conclusion

Step 4 Define

76 Topics in Splines and Applications

as the effective model space.

ð Þ <sup>i</sup>; <sup>j</sup> th entry is R x<sup>∗</sup>

<sup>H</sup><sup>E</sup> <sup>¼</sup> <sup>H</sup><sup>0</sup> <sup>⊕</sup> span R x<sup>∗</sup>

M

k¼1

. Then, the estimator η<sup>A</sup> satisfies

, dA ¼ ð Þ d1; …; dM

<sup>∗</sup>R<sup>∗</sup> þ nλR∗∗ ! dA

selection method for exponential family smoothing spline models was developed in [32].

<sup>η</sup>Að Þ¼ <sup>x</sup> <sup>X</sup>

STS STR<sup>∗</sup>

rounding algorithm to fit SSANOVA models in the context of big data.

Step 1 Assume that all predictors are continuous.

Step 2 Convert all predictors to the interval 0½ � ; 1 .

Step 3 Round the raw data by using the transformation:

zi jh i ¼ RD xi jh i=rh i<sup>j</sup>

Let <sup>R</sup><sup>∗</sup> be an <sup>n</sup> � <sup>n</sup><sup>∗</sup> matrix, and its ð Þ <sup>i</sup>; <sup>j</sup> th entry is R xi; <sup>x</sup><sup>∗</sup>

RT <sup>∗</sup> S RT

<sup>i</sup> ; x<sup>∗</sup> j � �

� �<sup>T</sup>

linear system of equations in this case is

where η<sup>A</sup> ¼ ηAð Þ x<sup>1</sup> ; ⋯; η<sup>A</sup> xn ð Þ<sup>∗</sup>

3.2. Rounding algorithm

the nearest integer.

By adaptive basis selection, the minimizer of (2) keeps the same form as that in Theorem 2.3:

dkξkð Þþ <sup>x</sup> <sup>X</sup><sup>n</sup><sup>∗</sup>

η<sup>A</sup> ¼ SdA þ R∗cA,

The computational complexity of solving (18) is of the order O nn<sup>∗</sup><sup>2</sup> � �, so the method decreases the computational cost significantly. It can also be shown that the adaptive sampling basis selection smoothing spline estimator η<sup>A</sup> has the same convergence property as the full basis method. More details about the consistency theory can be found in [14]. Moreover, adaptive sampling basis

Other than sampling a smaller set of basis functions to save the computational resources, for example, the adaptive basis selection method presented previously, [15] proposed a new

Rounding algorithm: The details of rounding algorithm can be shown in the following procedure:

where the rounding parameter rh i<sup>j</sup> ∈ ð � 0; 1 and rounding function RDð Þ� transform input data to

� �rh i<sup>j</sup> , for <sup>i</sup>∈f g <sup>1</sup>; <sup>⋯</sup>; <sup>n</sup> , j<sup>∈</sup> f g <sup>1</sup>; <sup>⋯</sup>; <sup>d</sup> ,

i¼1

cA � � <sup>¼</sup> <sup>S</sup><sup>T</sup><sup>y</sup> RT ∗ y

!

j � �

ciR x<sup>∗</sup> <sup>i</sup> ; <sup>x</sup> � �:

. Let <sup>R</sup>∗∗ be an <sup>n</sup><sup>∗</sup> � <sup>n</sup><sup>∗</sup> matrix, and its

<sup>T</sup>, and cA <sup>¼</sup> <sup>c</sup>1;…; cn ð Þ<sup>∗</sup> <sup>T</sup>. Similar to (17), the

: (18)

<sup>i</sup> ; � � �; <sup>i</sup> <sup>¼</sup> <sup>1</sup>; <sup>2</sup>;…; <sup>n</sup><sup>∗</sup> � �

Smoothing spline ANOVA (SSANOVA) models are widely used in applications [11, 20, 36, 37]. In this chapter, we introduced the general framework of the SSANOVA models in Section 2. In Section 3, we discussed the models under the big data settings. When the volume of data grows, fitting the models is computing-intensive [11]. The adaptive basis selection algorithm [14] and rounding algorithm [15] we presented can significantly reduce the computational cost.

The se.fit parameter indicates if one can get the pointwise standard errors for the predicted values. The predicted values and Bayesian confidence interval, shown in Figure 4, are gener-

Smoothing Spline ANOVA Models and their Applications in Complex and Massive Datasets

http://dx.doi.org/10.5772/intechopen.75861

79

In this example, we illustrate how to implement the SSANOVA model using the gss package. The data is from an experiment in which a single-cylinder engine is run with ethanol to see how the nox concentration nox in the exhaust depends on the compression ratio comp and the equivalence ratio equi. The fitted model contains two predictors (comp and equi) and one

Figure 4. The solid red line represents the fitted values. The green lines represent the 95% Bayesian confidence interval.

comp\*equi,data=nox)

ated by:

plot(x,y,col=1)

interaction term.

data(nox)

lines(new\$x,est\$fit,col=2)

nox.fit <- ssanova(log10(nox)˜

temp <- function(x, y){

The raw data are shown as the circles.

The predicted values are shown in Figure 5.

x=seq(min(nox\$comp),max(nox\$comp),len=50) y=seq(min(nox\$equi),max(nox\$equi),len=50)

lines(new\$x,est\$fit+1.96\*est\$se,col=3) lines(new\$x,est\$fit-1.96\*est\$se,col=3)

Example II: Apply the SSANOVA model to a real dataset.
