3. Efficient approximation algorithm in massive datasets

In this section, we consider SSANOVA models under the big data settings. The computational cost of solving (17) is of the order O n<sup>3</sup> � � and thus gives rise to a challenge on the application of SSANOVA models when the volume of data grows. To reduce the computational load, an obvious way is to select a subset of basis functions randomly. However, it is hard to keep the data features by uniform sampling. In the following section, we present an adaptive basis selection method and show its advantages over uniform sampling [14]. Instead of selecting basis functions, another approach to reduce the computational cost is shrinking the original sample size by rounding algorithm [15].

#### 3.1. Adaptive basis selection

where η<sup>c</sup> is a constant function; η<sup>1</sup> and η<sup>2</sup> are the main effects of time and location, respectively;

Figure 2. Heatmaps of tweet counts in the contiguous United States. (a) Tweet counts at 2:00 a.m. (b) Tweet counts at

The main effects of time and location are shown in Figure 3. Obviously, in panel (a), the number of tweets has the periodic effect, where it attains the maximum value at 8:00 p.m. and the minimum value at 5:00 a.m. The main effect of time shows the variations of Twitter usages in the United States. In addition, we can infer how the tweet counts vary across different locations based on panel (b) in Figure 3. There tend to be more tweets in the east than those in the west regions and more tweets in the coastal zone than those in the inland. We use the

> <sup>π</sup> <sup>¼</sup> <sup>b</sup>η<sup>12</sup> � �<sup>T</sup>

the estimated interaction effect term, where η12ð Þ¼ x η<sup>12</sup> xh i<sup>1</sup> ; xh i<sup>2</sup>

Figure 3. (a) The main effect function of time (hours). (b) The main effect function of location.

to quantify the percentage decomposition of the sum of squares of <sup>b</sup><sup>y</sup> [11], where

� �<sup>T</sup> is the predicted values of log ð Þ #Tweets , and <sup>b</sup>η<sup>12</sup> <sup>¼</sup> <sup>η</sup>12ð Þ <sup>x</sup><sup>1</sup> ;…; <sup>η</sup>12ð Þ xn

by=∥by∥<sup>2</sup>

� �<sup>T</sup> is

� �. In our fitted model,

and η<sup>12</sup> is the spatial-time interaction effect.

scaled dot product

6:00 p.m.

74 Topics in Splines and Applications

<sup>b</sup><sup>y</sup> <sup>¼</sup> <sup>b</sup>y1;…; <sup>b</sup>yn

A natural way to select the basis functions is through uniform sampling. Suppose that we randomly select a subset x ¼ x1;…; xn � � from f g <sup>x</sup>1; …; xn , where <sup>n</sup> is the subsample size. Thus, the kernel matrix would be R xi ; <sup>x</sup> � �, <sup>i</sup> <sup>¼</sup> <sup>1</sup>,…, <sup>n</sup>. Then, one minimizes (17) in the effective model space:

$$\mathcal{H}\_{\mathbb{E}} = \mathcal{N} \oplus \text{span}\{\mathcal{R}(\mathbb{X}\_i, \mathfrak{x}), i = 1, 2, \dots, \mathbb{M}\}.$$

The computational cost will be reduced significantly to O nð Þ n2 if n is much smaller than n. Furthermore, it can be proven that the minimizer of (2), η, by uniform sampling basis selection, has the same asymptotic convergence rate as the full basis minimizer <sup>b</sup>η.

Although the uniform basis selection reduces the computational cost and the corresponding η achieves the optimal asymptotic convergence rate, it may fail to retain the data features occasionally. For example, when the data are not evenly distributed, it is hard for uniform sampling to capture the feature where there are only a few data points. In [14], an adaptive basis selection method is proposed. The main idea is to sample more basis functions where the response functions change largely and fewer basis functions on those flat regions. More details of adaptive basis selection method are shown in the following procedure:

Step 1 Divide the range of responses yi � �<sup>n</sup> <sup>i</sup>¼<sup>1</sup> into <sup>K</sup> disjoint intervals, <sup>S</sup>1,…, SK. Denote <sup>∣</sup>Sk<sup>∣</sup> as the number of observations in Sk.

Step 2 For each Sk, draw a random sample of size nk from this collection. Let <sup>x</sup><sup>∗</sup>ð Þ<sup>k</sup> <sup>¼</sup> <sup>x</sup> ∗ð Þk <sup>1</sup> ;…; x ∗ð Þk nk � � be the predictor values.

Step 3 Combine x<sup>∗</sup>ð Þ<sup>1</sup> , …, x<sup>∗</sup>ð Þ <sup>K</sup> together to form a set of sampled predictor values x<sup>∗</sup> <sup>1</sup>;…; x<sup>∗</sup> n∗ � �, where <sup>n</sup><sup>∗</sup> <sup>¼</sup> <sup>P</sup><sup>K</sup> <sup>k</sup>¼<sup>1</sup> nk.

Step 4 Define

$$\mathcal{H}\_{\mathbf{E}} = \mathcal{H}\_0 \oplus \text{span}\{\mathcal{R}(\mathbf{x}\_i^\*, \cdot), i = 1, 2, \dots, n^\*\}$$

as the effective model space.

By adaptive basis selection, the minimizer of (2) keeps the same form as that in Theorem 2.3:

$$\eta\_A(\mathbf{x}) = \sum\_{k=1}^{M} d\_k \xi\_k(\mathbf{x}) + \sum\_{i=1}^{n^\*} c\_i \mathbb{R}\left(\mathbf{x}\_i^\*, \mathbf{x}\right).$$

Let <sup>R</sup><sup>∗</sup> be an <sup>n</sup> � <sup>n</sup><sup>∗</sup> matrix, and its ð Þ <sup>i</sup>; <sup>j</sup> th entry is R xi; <sup>x</sup><sup>∗</sup> j � �. Let <sup>R</sup>∗∗ be an <sup>n</sup><sup>∗</sup> � <sup>n</sup><sup>∗</sup> matrix, and its ð Þ <sup>i</sup>; <sup>j</sup> th entry is R x<sup>∗</sup> <sup>i</sup> ; x<sup>∗</sup> j � �. Then, the estimator <sup>η</sup><sup>A</sup> satisfies

$$
\eta\_{\mathbf{A}} = \mathbf{S} \mathbf{d}\_{\mathbf{A}} + \mathcal{R}\_{\ast} \mathbf{c}\_{\mathbf{A}\prime}
$$

where η<sup>A</sup> ¼ ηAð Þ x<sup>1</sup> ; ⋯; η<sup>A</sup> xn ð Þ<sup>∗</sup> � �<sup>T</sup> , dA ¼ ð Þ d1; …; dM <sup>T</sup>, and cA <sup>¼</sup> <sup>c</sup>1;…; cn ð Þ<sup>∗</sup> <sup>T</sup>. Similar to (17), the linear system of equations in this case is

$$
\begin{pmatrix} \mathbf{S}^T \mathbf{S} & \mathbf{S}^T \mathbf{R}\_\* \\ \mathbf{R}^T \mathbf{S} & \mathbf{R}^T\_\* \mathbf{R}\_\* + n\lambda \mathbf{R}\_{\*\*} \end{pmatrix} \begin{pmatrix} \mathbf{d}\_\mathbf{A} \\ \mathbf{c}\_\mathbf{A} \end{pmatrix} = \begin{pmatrix} \mathbf{S}^T \mathbf{y} \\ \mathbf{R}^T\_\* \mathbf{y} \end{pmatrix}. \tag{18}
$$

Step 4 After replacing xi jh i with zi jh i, we redefine S and R in (16) and then estimate η by

Smoothing Spline ANOVA Models and their Applications in Complex and Massive Datasets

http://dx.doi.org/10.5772/intechopen.75861

77

Remark 1 In Step 3, if rh i<sup>j</sup> is the rounding parameter for jth predictor and its value is 0.03, then

Remark 2 It is evident that the value of rounding parameter can influence the precision of approximation. The smaller the rounding parameter, the better the model estimation and the

Computational benefits: We now briefly explain why the implementation of rounding algorithm can reduce the computational loads. For example, if the rounding parameter r ¼ 0:01, it is obvious that u ≤ 101, where u denotes the number of uniquely observed values. In conclusion, using user-tunable rounding algorithm can dramatically reduce the computational bur-

Case study: To illustrate the benefit of the rounding algorithm, we apply the algorithm to the electroencephalography (EEG) dataset. Note that EEG is a monitoring method to record the electrical activity of the brain. It can be used to diagnose sleep disorders, epilepsy, encephalop-

The dataset [33] contains 44 controls and 76 alcoholics. Each subject was repeatedly measured 10 times by using visual stimulus at a frequency of 256 Hz. This brings about n ¼ 10 replications �120 subjects �256 time points ¼ 307; 200 observations. There are two predictors, time and group (control vs. alcoholic). We apply the cubic spline to the time effect and the nominal

After applying the model to the unrounded data, rounded data with rounding parameter r ¼ 0:01 and r ¼ 0:05 for time covariate, we can obtain a summary table about GCV, AIC [34],

Based on Table 2, we can easily see that there are no significant difference among the GCV scores and AIC/BIC. In addition using rounding algorithm reduces 92% CPU time compared

Unrounded data 85.9574 2,240,019 2,240,562 15.65 Rounded data with r ¼ 0:01 86.6667 2,242,544 2,242,833 1.22 Rounded data with r ¼ 0:05 86.7654 2,242,893 2,243,089 1.13

GCV AIC BIC CPU time (seconds)

Smoothing spline ANOVA (SSANOVA) models are widely used in applications [11, 20, 36, 37]. In this chapter, we introduced the general framework of the SSANOVA models in Section 2. In

each zi jh i is formed by rounding the corresponding xi jh i to its nearest 0.03.

den of fitting SSANOVA models from the order of O n<sup>3</sup> to O u<sup>3</sup> , where u ≪ n.

minimizing the penalized least squares (16).

higher the computational cost.

athies, and brain death.

spline to the group effect.

to using unrounded dataset.

4. Conclusion

BIC [35], and running time in Table 2.

Table 2. Fit statistics and running time for SSANOVA models.

The computational complexity of solving (18) is of the order O nn<sup>∗</sup><sup>2</sup> � �, so the method decreases the computational cost significantly. It can also be shown that the adaptive sampling basis selection smoothing spline estimator η<sup>A</sup> has the same convergence property as the full basis method. More details about the consistency theory can be found in [14]. Moreover, adaptive sampling basis selection method for exponential family smoothing spline models was developed in [32].

#### 3.2. Rounding algorithm

Other than sampling a smaller set of basis functions to save the computational resources, for example, the adaptive basis selection method presented previously, [15] proposed a new rounding algorithm to fit SSANOVA models in the context of big data.

Rounding algorithm: The details of rounding algorithm can be shown in the following procedure:

Step 1 Assume that all predictors are continuous.

Step 2 Convert all predictors to the interval 0½ � ; 1 .

Step 3 Round the raw data by using the transformation:

$$z\_{i\langle j\rangle} = RD(\mathbf{x}\_{i\langle j\rangle}/r\_{\langle j\rangle})r\_{\langle j\rangle} \text{ for } i \in \{1, \cdots, n\}, j \in \{1, \cdots, d\}.$$

where the rounding parameter rh i<sup>j</sup> ∈ ð � 0; 1 and rounding function RDð Þ� transform input data to the nearest integer.

Step 4 After replacing xi jh i with zi jh i, we redefine S and R in (16) and then estimate η by minimizing the penalized least squares (16).

Remark 1 In Step 3, if rh i<sup>j</sup> is the rounding parameter for jth predictor and its value is 0.03, then each zi jh i is formed by rounding the corresponding xi jh i to its nearest 0.03.

Remark 2 It is evident that the value of rounding parameter can influence the precision of approximation. The smaller the rounding parameter, the better the model estimation and the higher the computational cost.

Computational benefits: We now briefly explain why the implementation of rounding algorithm can reduce the computational loads. For example, if the rounding parameter r ¼ 0:01, it is obvious that u ≤ 101, where u denotes the number of uniquely observed values. In conclusion, using user-tunable rounding algorithm can dramatically reduce the computational burden of fitting SSANOVA models from the order of O n<sup>3</sup> to O u<sup>3</sup> , where u ≪ n.

Case study: To illustrate the benefit of the rounding algorithm, we apply the algorithm to the electroencephalography (EEG) dataset. Note that EEG is a monitoring method to record the electrical activity of the brain. It can be used to diagnose sleep disorders, epilepsy, encephalopathies, and brain death.

The dataset [33] contains 44 controls and 76 alcoholics. Each subject was repeatedly measured 10 times by using visual stimulus at a frequency of 256 Hz. This brings about n ¼ 10 replications �120 subjects �256 time points ¼ 307; 200 observations. There are two predictors, time and group (control vs. alcoholic). We apply the cubic spline to the time effect and the nominal spline to the group effect.

After applying the model to the unrounded data, rounded data with rounding parameter r ¼ 0:01 and r ¼ 0:05 for time covariate, we can obtain a summary table about GCV, AIC [34], BIC [35], and running time in Table 2.

Based on Table 2, we can easily see that there are no significant difference among the GCV scores and AIC/BIC. In addition using rounding algorithm reduces 92% CPU time compared to using unrounded dataset.


Table 2. Fit statistics and running time for SSANOVA models.
