Approaches and New Methods

## **Chapter 5**

## A New Approach of Power Transformations in Functional Non-Parametric Temperature Time Series

*Haithem Taha Mohammed Ali and Sameera Abdulsalam Othman*

## **Abstract**

In nonparametric analyses, many authors indicate that the kernel density functions work well when the variable is close to the Gaussian shape. This chapter interest is on the improvement the forecastability of the functional nonparametric time series by using a new approach of the parametric power transformation. The choice of the power parameter in this approach is based on minimizing the mean integrated square error of kernel estimation. Many authors have used this criterion in estimating density under the assumption that the original data follow a known probability distribution. In this chapter, the authors assumed that the original data were of unknown distribution and set the theoretical framework to derive a criterion for estimating the power parameter and proposed an application algorithm in two-time series of temperature monthly averages.

**Keywords:** functional non-parametric time series, power transformation, Kernel density function, Mean Integrated Square Error

## **1. Introduction**

One of the most common approaches for studying forecasting models is the Nonparametric functional regression method, which has been successfully applied in time series analysis. In this chapter, a new approach of power transformation is proposed to improve time series prediction when using functional nonparametric techniques. Although the nonparametric regression estimation under dependence is a useful tool for forecasting in time series [1], the functional and nonparametric approaches does not work well in certain circumstances.

Regarding the functional approach, The functional data (FD) analysis treats with the observations as a functions [2] without the need for fully parametric and non-parametric modeling conditions. In other words, FD analysis reduces the size of the data by clarifying the correlations between a large number of variables by a small number of factors or functions [3]. This transformation of the data structure into a linear combination of a

few functions (curves) is equivalent to structural regression models. A number of useful semi-metrics families can be used to measure the proximities between the curves of the functional variables. One of these ways, for example, the Functional Principle Components Analysis (FPCA) [4, 5]. In some data sets with the time dependence of observations, FPCA may lead to weak estimates and that this problem may be exacerbated in some time series data sets especially those characterized by the presence of seasonal changes [6] (See also [7] who pointed out that the standard PCA may not be the suitable technique to apply when the data distribution is skewed or there are outliers).

As for the nonparametric approach to estimating kernel density functions (KDF) or predictions in regression and time series models, although this approach is a Distribution-Free method, the symmetricity of the data is an important issue in order to obtain efficient estimators [8, 9].

As for time series and the goal of improving forecastability, it is known that the time series data sets in practical applications are rarely adapted for statistical analysis due to their instability in variance, trend, and seasonal variations [10].

Based on the aforementioned requirements of importance that precede the analysis and inference in the functional nonparametric time series analysis, it can be said that the power transformation (PT) provide a novel corrective framework of the predictive modeling.

The rest of the chapter is organized as follows: The second section includes some explanations and clarifications of some traditional approaches of PT and their uses in KDF. In the third section, the authors present their proposal which contains a new approach for Transforming of KDS. Section four includes the algorithm for applying the proposed method. While the fifth section includes an applications of the proposed method to two temperature time series datasets. Finally, the sixth section included the conclusions and some future recommendations.

## **2. The traditional approaches of transformations in KDF**

There is a long tradition of applying PT models in statistical applications. In 1952, Finney [11] used the PT model *<sup>Ψ</sup>*ð Þ¼ *<sup>Z</sup> <sup>Z</sup><sup>λ</sup>* when *<sup>λ</sup>* ¼6 **<sup>0</sup>** and *<sup>Ψ</sup>*ð Þ¼ *<sup>Z</sup> log z*ð Þ when *<sup>λ</sup>* <sup>¼</sup> **<sup>0</sup>**, where Z represents the original dose variable in the biological assay. The purpose of using transformation in dose response relationship was to achieve the monotonous and linear characteristics for the Intrinsically nonlinear models. In 1964, Box and Cox [12] proposed the following general class of transformation of the response variable in the multiple linear regression model,

$$\Psi(\mathbf{Z}) = \begin{cases} \mathbf{Z}^{\lambda} - \mathbf{1} & \text{if } \lambda \neq \mathbf{0} \\ \frac{\lambda}{\log \left( x \right)} & \text{if } \lambda = \mathbf{0} \end{cases} \tag{1}$$

to achieve a linear relationship with normality errors. In 1977, Tukey [13] describes an orderly way of re-expressing variables using the following model in order to preserve the order of the variable after the PT is used,

$$\Psi(\mathbf{Z}) = \begin{cases} \begin{array}{c} Z^{\lambda} \ \text{if } \lambda > \mathbf{0} \\ \log\left(\mathbf{z}\right) \ \text{if } \lambda = \mathbf{0} \\ -Z^{\lambda} \ \text{if } \lambda < \mathbf{0} \end{array} \tag{2}$$

*A New Approach of Power Transformations in Functional Non-Parametric Temperature Time… DOI: http://dx.doi.org/10.5772/intechopen.105832*

to make the relationship as close to a straight line as possible. As for the nonparametric estimates, many authors refer to the usefulness of PT in reducing the bias of KDF when the data is clearly skewed or heavy tailed [14], (For more details see, [15–17]).

Regarding the transformation parameter estimation issues, most transformation methodologies have a common analytical path, which is the choice of the PT model and proposing an algorithm for estimating the power parameters in parallel with the mechanisms of estimating traditional model parameters. As for the approach of transforming the probability density function (PDF) of all or some model variables before proposing an algorithm estimation, there are at least two common methodologies of data transformation. The first in chronological order is the Box Cox transformation (BCT) methodology of data transforming to normality of response variable in parametric multiple regression models [12]. The common decision rule for selecting power parameter estimator in this approach is the maximization of log likelihood function of the PDF of original data. In some cases, the Bayesian estimating method is used and many other methods included in the subject literature can also be used to choose the transformation parameter. The second methodology is proposed by Wand, Marron and Ruppert in 1991 [8] to transform the KDF to a symmetrical shape in density function. In this methodology, the decision rule for selecting optimal estimator of density power parameter is the minimization of the Mean Integrated Square Error (MISE) of KDF estimator. Both transformation ways are used the distribution approach of transformed data and therefor defining the original data distribution as a "back-transformed" of change-of-variable technique. Mathematically, in the case of the univariate random variable *Z*, Box and Cox [12] assumes that there exist a parametric PT function *<sup>ψ</sup>*ð Þ*:* of the random variable *<sup>Z</sup>* such that *<sup>ψ</sup>*ð Þ¼ *<sup>z</sup> <sup>Z</sup>*ð Þ*<sup>λ</sup>* � *N μ*, *σ***<sup>2</sup>** � � under the assumption that the original data is of unknown distribution. Therefore, the PDF of the original variable is given by,

$$f\_Z(\mathbf{z}) = f\_{\varphi(\mathbf{z})}(\varphi(\mathbf{z}\_i); \boldsymbol{\mu}, \sigma^2). \left| \frac{d\boldsymbol{\mu}(\mathbf{z})}{d\mathbf{z}\_i} \right| \tag{3}$$

While the second methodology [8], Wand, Marron, and Ruppert assumes the estimated KDF of the transformed variable *ψ*ð Þ*z* that is close to the symmetrical shape is given by,

$$f\_{\boldsymbol{\Psi}^{(\mathbf{z})}}(\boldsymbol{\Psi}(\mathbf{z});\lambda) = \boldsymbol{f}\_Z\left\{\boldsymbol{\Psi}^{-1}(\mathbf{z})\right\}\left(\boldsymbol{\Psi}^{-1}\right)'(\boldsymbol{\Psi}(\mathbf{z})) \tag{4}$$

Such that the estimated KDF of the original variable *Z* is the back-transform of (Eq. (4)) and given by,

$$\hat{f}\_{\mathbf{z}}(\mathbf{z}; \mathbf{h}, \lambda) = \mathfrak{n}^{-1} \sum \mathfrak{y}'(\mathbf{z}) k\_{\mathbf{h}} \{\mathfrak{y}(\mathbf{z}) - \mathfrak{y}(\mathbf{Z}\_i)\} \tag{5}$$

Where h is the bandwidth and the kernel K is a density. In brief terms, the first methodology aims in the parametric models to improve the efficiency of the statistical inference based on the data normality, and the second methodology aims to improve the kernel estimator at least on the basis of symmetrical data. And in the same context, the literatures recommend the use of transformations as long as they can improve interpretation of effect sizes between variables [13] or given the fact that model parameters are not easily interpreted in terms of the original response [14], (For more details, see [15–17]).

Now, assuming that *U* ¼ *ψ*ð Þ*z* , The optimum value of the PT parameter *λ* is the one that corresponds to the lowest possible value of MISE of the estimated density (Eq. (5)) and given by,

$$\text{MISE}\_Z(\hbar,\lambda) = \mathbf{E}\left[\left\{\hat{f}\_Z(\mathbf{z};\hbar,\lambda) - f\_Z(\mathbf{z})\right\}^2 \mathbf{d}\mathbf{z}\right] \tag{6}$$

Assume that the first and second derivatives of the function *fZ*ð Þ*z* exist, as well as that *<sup>K</sup>***<sup>1</sup>** <sup>¼</sup> <sup>Ð</sup> *Z***2** *K Z*ð Þ *dz, <sup>K</sup>***<sup>2</sup>** <sup>¼</sup> <sup>Ð</sup> *<sup>K</sup>***<sup>2</sup>**ð Þ *<sup>Z</sup> dz* so

$$\text{MISE}\_Z(h,\lambda) = \text{AMISE}\_Z(h,\lambda) + \mathcal{O}\left(h^4 + n^{-1}h^{-1}\right) \tag{7}$$

Where:

$$\text{AMISE}\_{Z}(h,\lambda) = h^{4}\left(\mathbf{K}\_{1}^{2}/4\right) \left[\boldsymbol{\psi}^{\prime}\left\{\boldsymbol{\psi}^{-1}(u)\right\} \boldsymbol{f}\_{\boldsymbol{U}}^{\prime}(u;\lambda)^{2} du + n^{-1} h^{-1} \mathbf{K}\_{2} \to \boldsymbol{\psi}^{\prime}(\boldsymbol{x})\right] \tag{8}$$

and the minimized window width for any value of **λ** is given by,

$$\boldsymbol{h}\_{\boldsymbol{\lambda},\boldsymbol{x}}^{\*} = \left[\frac{\boldsymbol{K}\_{2}\boldsymbol{E}\,\boldsymbol{\Psi}'(\boldsymbol{x})}{\boldsymbol{K}\_{1}^{2}\left[\boldsymbol{\Psi}'\left\{\boldsymbol{\Psi}^{-1}(\boldsymbol{u})\right\}\boldsymbol{f}\_{\boldsymbol{U}}''(\boldsymbol{u};\boldsymbol{\lambda})^{2}\boldsymbol{d}\boldsymbol{u}\right]}\right]^{1/5}\boldsymbol{\mathcal{n}}^{-1/5}\tag{9}$$

and it contains less **AMISEz**ð Þ *:*, **λ** for each constant value of **λ** that equals,

$$\inf\_{h>0} \text{AMISE}\_{\mathbf{z}}(h,\lambda) = (\mathbf{5}/4) \left(\mathbf{K}\_1 \mathbf{K}\_2^2\right)^{2/5} \mathbf{J}\_{\mathbf{z}}(\lambda) \mathfrak{n}^{-4/5} \tag{10}$$

Where,

$$J\_{\mathfrak{z}}(\lambda) = \left[ \left\{ E \,\upmu'(\mathfrak{z}) \right\}^4 \int \limits \bigu{\mu'} \left\{ \left\langle \upmu^{-1}(u) \right\rangle f\_U''(u;\lambda)^2 du \right\}^{1/5} \tag{11}$$

The last two equations (Eqs. (10) and (11)) represent a measure of the transformation's *ψ*ð Þ*z* influence on minimizing the error associated with estimating the function of the original data ^*<sup>f</sup> <sup>z</sup>*ð Þ *:*; *<sup>h</sup>*, *<sup>λ</sup>* . Therefore, the optimal value of <sup>λ</sup> can be known as the one that minimizes: *Inf h >* **0** *AMISEz*ð Þ *h*, *λ .*

By the same decision rules logic *AMISEz*ð Þ *h*, *λ* , derived from the density estimation of the transformed variable **Z**, the optimal asymptotical window width for each **λ** according to the original random variable is:

$$h^\*\_{\lambda, u} = \left[\frac{K\_2}{K\_1^2 J\_u(\lambda) n}\right]^{1/5} \tag{12}$$

Asymptotically, the optimal choice of **λ** minimizes

$$\inf\_{h>0} \text{AMISE}\_{\mathfrak{u}}(h,\lambda) = (\mathfrak{F}/\mathfrak{A}) \left(\mathbf{K}\_1 \mathbf{K}\_2^2\right)^{2/5} \mathbf{J}\_{\mathfrak{u}}(\lambda) n^{-4/5} \tag{13}$$

*A New Approach of Power Transformations in Functional Non-Parametric Temperature Time… DOI: http://dx.doi.org/10.5772/intechopen.105832*

Where:

$$J\_{\mathfrak{u}}(\lambda) = \left[ \int f\_{U}^{\prime\prime}(\mathfrak{u}; \lambda)^{2} d\mathfrak{u} \right]^{1/5} \tag{14}$$

In other words, it can be said that the minimization of *Jz*ð Þ*λ* and *Ju*ð Þ*λ* are the sufficient condition to prove the optimization of *λ* since Eq. (11) and Eq. (14) represents the variable parts of Eq. (10) and Eq. (13) respectively.

Finally, the relationship between *MISEu*ð Þ *h*, *λ* and *MISEz*ð Þ *h*, *λ* can be determined according to the equations:

$$\text{MISE}\_{\mathfrak{x}}(\mathfrak{h},\mathfrak{\lambda}) = E\left[ \left\{ \hat{f}\_{\mathfrak{u}}(\mathfrak{u}) - f\_{\mathfrak{u}}(\mathfrak{u}) \right\}^2 \mathfrak{y}' \left\{ \mathfrak{y}^{-1}(\mathfrak{u}) \right\} du \tag{15}$$

$$\text{MISE}\_{\mathfrak{u}}(\mathfrak{h}, \lambda) = E\left[ \left\{ \hat{f}\_{\mathfrak{z}}(\mathfrak{u}) - f\_{\mathfrak{z}}(\mathfrak{u}) \right\}^2 \Psi^{-1} \right] \langle \Psi(\mathfrak{z}) \rangle d\mathfrak{z}.\tag{16}$$

Both error functions yield the same results, whether in terms of the original variable or of the transformed variable.

## **3. A new approach of transformations in KDS**

Unlike BCT methodology, which assumes that the original data is of unknown distribution, PTs' in KDF estimation are used to shifted the random variables with a known distribution into symmetric shapes to obtain an efficient kernel density estimation. The statistical literature in nonparametric estimation suggested the use of MISE indicator as a decision rule for power parameter estimation for a number of distributions such as Lognormal [8, 18], gamma [8], Cauchy [9], Pareto [18] and heavy-tailed distributions [19, 20].

Now, similar to the BCT approach, the primary hypothesis of the new approach in this chapter is that the data do not have a definite distribution. We will use the power transformation to transform the data to a normal shape and use MISE as a decision rule to choose the optimal value of the power parameter. Later in the sections 4 and 5 we will use this approach in the functional nonparametric time series analysis.

Let us assume that we have the random variable *Z* with unknown distribution and *<sup>U</sup>* <sup>¼</sup> *<sup>ψ</sup>*ð Þ*<sup>z</sup>* represents a PT model. Let, for Finney transformation (FT), suppose *<sup>U</sup>* <sup>¼</sup> *<sup>z</sup><sup>λ</sup>* follows the normal distribution with mean **μ** and variance **σ<sup>2</sup>**. Therefore, according to the Eq. (3), the PDF of the original variable *Z* is given by *fZ*ð Þ¼ *z ψ*<sup>0</sup> ð Þ*<sup>z</sup> <sup>f</sup> <sup>U</sup> <sup>ψ</sup>*ð Þ *zi* ; *<sup>μ</sup>*, *<sup>σ</sup>***<sup>2</sup>** � �.

In our proposed approach, the assumption of the normality of the transformed data when the original data is of unknown distribution provides uncomplicated options for estimating the power parameter so that Eq. (14) can be used as the simplest alternative to Eq. (11). In our assumption, we have,

$$f\_{\,\,U}(u;\lambda) = \frac{1}{\sqrt{2\pi\sigma^2}} \, e^{\frac{-(u-\rho)^2}{2\sigma^2}}, \mathcal{U} \in \mathcal{R} \tag{17}$$

So, the square of the second derivative of Eq. (17) is,

$$\left(f^{\prime\prime}(u;\lambda)\right)^{2} = \left(\sqrt{2\pi\sigma^{2}}\right)^{-2} \exp\left(\frac{-\left(u-\mu\right)^{2}}{\sigma^{2}}\right) \left[\frac{1}{\sigma^{4}} + \frac{-2}{\sigma^{2}}\frac{\left(u-\mu\right)^{2}}{\sigma^{4}} + \frac{\left(u-\mu\right)^{4}}{\sigma^{8}}\right] \tag{18}$$

By inserting the integration factor, we get,

$$\int f^{\prime\prime}(u;\lambda)^2 du = \sigma^{-8} \left(\sqrt{2\pi\sigma^2}\right)^{-1} \left[\int \sigma^4 \left(\sqrt{2\pi\sigma^2}\right)^{-1} e^{\frac{-(u-\mu)^2}{\sigma^2}} du\right.\tag{19}$$

$$-\int 2\sigma^2 (u-\mu)^2 \left(\sqrt{2\pi\sigma^2}\right)^{-1} e^{\frac{-(u-\mu)^2}{\sigma^2}} du$$

$$+\int (u-\mu)^4 \left(\sqrt{2\pi\sigma^2}\right)^{-1} e^{\frac{-(u-\mu)^2}{\sigma^2}} du\right]$$

Assume **<sup>σ</sup><sup>2</sup>** <sup>¼</sup> **<sup>2</sup>δ<sup>2</sup>** , then the first term of Eq. (19),

$$\int \sigma^4 \frac{\mathbf{1}}{\sqrt{2\pi 2\delta^2}} e^{\frac{-(u-\mu)^2}{2\delta^2}} \, du = \frac{\sigma^4}{\sqrt{2}} \left[ \frac{\mathbf{1}}{\sqrt{2\pi \delta^2}} e^{\frac{-(u-\mu)^2}{2\delta^2}} \, du = \frac{\sigma^4}{\sqrt{2}} \right] \tag{20}$$

and the second term of Eq. (19),

$$\begin{split} \int \left[ 2\sigma^{2} (\boldsymbol{u} - \boldsymbol{\mu})^{2} \left( \sqrt{2\pi\sigma^{2}} \right)^{-1} \boldsymbol{e}^{\frac{-(\boldsymbol{u}-\boldsymbol{\mu})^{2}}{\sigma^{2}}} d\boldsymbol{u} \right. &= \frac{2\sigma^{2}}{\sqrt{2}} \Big[ (\boldsymbol{u} - \boldsymbol{\mu})^{2} \left( \sqrt{2\pi\delta^{2}} \right)^{-1} \boldsymbol{e}^{\frac{-(\boldsymbol{u}-\boldsymbol{\mu})^{2}}{2\delta^{2}}} d\boldsymbol{u} \\ &= \frac{2\sigma^{2}}{\sqrt{2}} \boldsymbol{E} (\boldsymbol{U} - \boldsymbol{\mu})^{2} = \frac{\sigma^{4}}{\sqrt{2}} \end{split} \tag{21}$$

and the third term of Eq. (19),

$$\begin{split} \int (u - \mu)^4 \left(\sqrt{2\pi\sigma^2}\right)^{-1} e^{\frac{-(u-\mu)^2}{\sigma^2}} du &= \frac{1}{\sqrt{2}} \int (u - \mu)^4 \left(\sqrt{2\pi\delta^2}\right)^{-1} e^{\frac{-(u-\mu)^2}{2\delta^2}} du \\ &= \frac{1}{\sqrt{2}} \operatorname{E} (U - \mu)^4 \end{split} \tag{22}$$

by using the central moments equation of the real-valued random variable *U*, *E U*ð Þ � *<sup>μ</sup> <sup>n</sup>* <sup>¼</sup> *<sup>E</sup>* P*<sup>n</sup> <sup>j</sup>*¼**0***C<sup>n</sup> <sup>j</sup>* ð Þ �**<sup>1</sup>** *<sup>n</sup>*�*<sup>j</sup> <sup>U</sup><sup>j</sup> <sup>μ</sup><sup>n</sup>*�*<sup>j</sup>* then,

$$\mathbf{E}(\mathbf{U} - \boldsymbol{\mu})^4 = \mathbf{E}(\mathbf{U}^4) - \mathbf{4}\,\mu\,\mathbf{E}(\mathbf{U}^3) + \mathbf{6}\,\mu^2\mathbf{E}(\mathbf{U}^2) - \mathbf{4}\,\mu^3\,\mathbf{E}(\mathbf{U}) + (\mu^4) \tag{23}$$

Based on the moments equation,

$$\begin{aligned} \text{Hess } \mathbf{e}\_1 &= \mathbf{E}(\mathbf{U}^k), \text{get,} \\ \mathbf{E}(\mathbf{U} - \boldsymbol{\mu})^4 &= \boldsymbol{\mu}\_4 - \mathbf{4}\,\boldsymbol{\mu}\_3\,\boldsymbol{\mu} + \mathbf{6}\,\boldsymbol{\mu}\_2\,\boldsymbol{\mu}^2 - \mathbf{4}\,\boldsymbol{\mu}\,\boldsymbol{\mu}^3 + \boldsymbol{\mu}^4 \end{aligned} \tag{24}$$

Substitute the three parts defined by Eq. (20), Eq. (21) and (Eq. (22) into Eq. (19) get,

$$J\_U(\lambda) = \left[\frac{\mathbf{1}}{\sigma^8} \left(\sqrt{2\pi\sigma^2}\right)^{-1} \left(\frac{\mathbf{1}}{\sqrt{2}} \left(\mu\_4 - 4\mu\_3\mu + 6\mu\_2\mu^2 - 4\mu\mu^3 + \mu^4\right)\right]^{1/5} \tag{25}$$

Eq. (25) is the end of the derivation. The optimal power parameter value is the one that minimizes the value of *JU*ð Þ*λ* . In the practical application, the estimators of the maximum likelihood method were used for the moments about zero *<sup>μ</sup>*^*<sup>k</sup>* <sup>¼</sup> <sup>P</sup>*<sup>n</sup> <sup>i</sup>*¼**<sup>1</sup>***u<sup>k</sup> <sup>i</sup> =n* and the central moments *<sup>μ</sup>*^*<sup>k</sup>* <sup>¼</sup> <sup>P</sup>ð Þ *ui* � *<sup>u</sup> k =n*.

*A New Approach of Power Transformations in Functional Non-Parametric Temperature Time… DOI: http://dx.doi.org/10.5772/intechopen.105832*

## **4. Proposed application algorithm**

For the univariate time series *{Zt*,*t R*g, assume that the sample is divided into ð Þ *p* þ **1** ) statistical samples of size ð Þ *n* ¼ *N* � *s* � *p* þ **1** so that the time series data set can be defined as a functional data *Xi* f g ð Þ , *Yi <sup>i</sup>*¼**1**,*::*,*<sup>n</sup>*. The regression model,

$$\mathbf{Y} = \mathbf{m}(\mathbf{X}) + \mathbf{e} \tag{26}$$

represents the relationship between the smooth functional data *m X*ð Þ and scalar response *Yi* ¼ *Zi*þ*<sup>s</sup>*, *i* ¼ *p*, , *N* � *s*. The white noise **ε** is a sequence of independent identically distributed functions in such *E*ð Þ¼ *ε=X* **0***. X***1**, *X***2**, … *:*, *Xn* are identically distributed as the functional random variable *Xi* ¼ *Zi*�*p*þ**<sup>1</sup>**, … ,*Zi* � �*.* Assume **<sup>N</sup>** <sup>¼</sup> **<sup>n</sup><sup>τ</sup>** for some *n ϵ N* and some *τ >* **0** to get a statistical sample of curves *Xi* ¼ f g *Z t*ð Þ,ð Þ *i* � **1** *τ < t iτ* of size ð Þ *n* � **1** and the response *Yi* ¼ *Z i*ð Þ *τ* þ *s* , *i* ¼ **1**, … , *n* � **1** [5]. The kernel regression estimator evaluated at a given function *m X*ð Þ in Eq. (26) by:

$$\hat{m}(\mathbf{X}) = \frac{\sum\_{i=1}^{n} \mathbf{Y}\_i \mathbf{K}\left(\boldsymbol{h}^{-1} \, d\left(\mathbf{X}, \mathbf{X}\_i\right)\right)}{\sum\_{i=1}^{n} \mathbf{K}\left(\boldsymbol{h}^{-1} \, d\left(\mathbf{X}, \mathbf{X}\_i\right)\right)} \tag{27}$$

Where K is a kernel function and, h (depending on n) are a positive real bandwidth and *d X*ð Þ , *Xi* denotes any semi-metric index of proximity between the observed curves based on the functional principal components [5, 6, 21]. Many authors have proposed a number of methods for measuring the proximity such as, the method of FPCA in which, *d X*ð Þ , *Xi* is measuring by the square root of the quantity

<sup>Ð</sup> *Xi*ð Þ� *<sup>t</sup> Xj*ð Þ*<sup>t</sup>* � �**<sup>2</sup>** *dt* or the quantity Ð *X*ð Þ **<sup>2</sup>** *<sup>i</sup>* ð Þ� *<sup>t</sup> <sup>X</sup>*ð Þ **<sup>2</sup>** *<sup>j</sup>* ð Þ*t* � �**<sup>2</sup>** *dt* (for more details, see [4, 21–25]).

The application methodology includes estimating the smooth functional data *m X*ð Þ in the regression equation Eq. (26) according to the kernel estimator Eq. (27) after transforming the time series dataset. So, the following proposed application algorithm of the nonparametric estimation of transformed functional time series according to the proposed new approach for transforming the kernel density were as follows:

Step 1: Choosing the common range *Λ* ¼ �f g **3**, **3** for the power parameter **λ** Step 2: Calculate the value of *<sup>J</sup><sup>ψ</sup>*ð Þ *<sup>Z</sup>* ð Þ*<sup>λ</sup>* according to Eq. (14).

Step 3: Transform the original response variable *Z* according the Finney [11] PT model, *<sup>Ψ</sup>*ð Þ¼ *<sup>Z</sup> <sup>Z</sup><sup>λ</sup>* when *<sup>λ</sup>* 6¼ **<sup>0</sup>** and BCT model Eq. (1) to get the explanatory functional matrices *Ψλ*ð Þ¼ *X* ½ � *Ψλ*ð Þ*z nx<sup>τ</sup>* (for more about the matrices file organizing in the R program, see [5, 21]".

Step 4: Redefining the functional data of the regression model *Xi* f g ð Þ , *Yi <sup>i</sup>*¼**1**,*::*,*<sup>n</sup>* so that the statistical sample of curves *Xi* ¼ f g *Z t*ð Þ,ð Þ *i* � **1** *τ < t iτ* is defined as follows,

$$\Psi\_{\lambda}(\mathbf{X}\_{i}) = \{\Psi\_{\lambda}(\mathbf{Z}(\mathbf{t})), (i - \mathbf{1})\mathbf{r} < \mathbf{t} \le i\pi\} \tag{28}$$

and the response *Yi* ¼ *Z i*ð Þ *τ* þ *s* , *i* ¼ **1**, … , *n* � **1** is defined as follows,

$$\Psi\_{\lambda}(\mathbf{Y}\_{i}) = \Psi\_{\lambda}(\mathbf{Z}(i\mathbf{r} + \mathbf{s})) \tag{29}$$

Step 5: Defining the Eq. (28) and Eq. (29) in which **τ** equal the seasonal length. Step 6: Estimate the explanatory function regression *Ψλ*ð Þ¼ *Yi m*ð Þþ *Ψλ*ð Þ *X ε*, (where)

$$\hat{m}(\boldsymbol{\Psi}\_{\boldsymbol{\lambda}}(\mathbf{X})) = \frac{\sum\_{i=1}^{n} \boldsymbol{\Psi}\_{\boldsymbol{\lambda}}(\mathbf{Y}\_{i}) \mathbf{K} \left(\boldsymbol{h}^{-1} \boldsymbol{d} \left(\boldsymbol{\Psi}\_{\boldsymbol{\lambda}}(\mathbf{X}), \boldsymbol{\Psi}\_{\boldsymbol{\lambda}}(\mathbf{X}\_{i})\right)\right)}{\sum\_{i=1}^{n} \boldsymbol{\K} \left(\boldsymbol{h}^{-1} \boldsymbol{d} \left(\boldsymbol{\Psi}\_{\boldsymbol{\lambda}}(\mathbf{X}), \boldsymbol{\Psi}\_{\boldsymbol{\lambda}}(\mathbf{X}\_{i})\right)\right)} \tag{30}$$

by using the Nadaraya–Watson regression estimator for functional data. Step 7: Perform the steps 2 through 6 for all **λ ∈ Λ**.

Step 8: Choose the optimal value that corresponds to the lowest value of *JU*ð Þ*λ* . Step 9: Calculate the estimator of mean square errors of the last curve *MSE X*ð Þ¼ *<sup>n</sup>* ð Þ **1***=s* P*<sup>s</sup> <sup>j</sup>*¼**<sup>1</sup>** ^*zj* � *zj* � �**<sup>2</sup>** , where, ^*zj* and **zj** are the j-th estimated and real values respectively in the last curve. ^*zj*values denoted to, they are computed from the back transform of *ψ*ð Þ¼ *z z***<sup>λ</sup>** *i* .

**Figure 1.** *Plots of the monthly temperature averages series: (a) TSN; (b) TST [21].*

*A New Approach of Power Transformations in Functional Non-Parametric Temperature Time… DOI: http://dx.doi.org/10.5772/intechopen.105832*

In all PT methodologies, the decision rule for choosing the optimal power parameter, always leads to what we might call the area of feasible solutions. For example, the argumentative question in BCT is: Does the optimal parameter that results from minimizing MLE method for the original response function achieve the normality of the transformed variable in practice? The authors believe that this problem is due to the nature of the data. In the proposed approach, the optimal power parameter that corresponds to the lowest *JU*ð Þ*λ* , we have the challenge of complexity in the feasible solutions area that we suppose to achieve: The transformed response normality in practical application that provides quality conditions for both functional and nonparametric analyzes approaches in nonstationary seasonal time series (For more see [26, 27] that point to other challenges related to the use of PT and the quality of the power parameter estimation).

## **5. Applications**

The PT models indicated in the proposed application algorithm have been applied to two examples of nonstationary time series of monthly temperature averages [21]. The first has a size of 200 observations of Nineveh City in Iraq (TSN) for the period 1976 to 2000 (**Figure 1a**). The second has a size of 300 observations of Tunisia (TST) of the period 1991 to 2015 (**Figure 1b**). R software was used to analyze the data. The data is available at https://climateknowledgeporta l.worldbank.org.

**Figure 2.**

*The curves of the ordered pairs* **<sup>λ</sup>**^*i*,*Ju*ð Þ*<sup>λ</sup> of the transformed responses of the two time series data sets using FT: (a) TSN. (b) TST.*

Returning to the ideas of the of feasible solutions area, we must verify the results of choosing the optimal PT value according to the proposed density transformation approach and its contribution to achieving the analysis efficiency requirements: the concavity of *Ju*ð Þ*λ* , the normality of the transformed response, and the reduction of the prediction error in the functional nonparametric time series analysis.

Mathematically, *Ju*ð Þ*λ* is a concave function, but a number of authors state the possibility that there is no mini-point or is not unique [8, 18]. This conclusion may depend on the success in choosing the appropriate PT model [20]. The plots in **Figures 2** and **3** show the curves of the ordered pairs **λ***<sup>i</sup>* ,*Ju*ð Þ*<sup>λ</sup>* of Eq. (14).

In **Figure 2**, it can be seen that the curves of the two time series data sets using FT has a concavity point in the range ð Þ �**3**, **0** , while the *Ju*ð Þ*λ* values tends to zero in which the curves fades towards the horizontal line in the range ð Þ **0**, **3** .

While when applying BCT, it becomes clear from **Figure 3** that there is no point of concavity in the curves of *Ju*ð Þ*λ* as its value goes to zero whenever the value of *λ* goes to �3. Therefore, it is not possible to obtain an optimal value for *λ*.

As for the normality of the transformed data, **Table 1** shows for the two examples, that the response variable data in its original and transformed states are not normal. Both optimal values of *λ* corresponding to the minimum values of *Ju*ð Þ*λ* did not shift the data to the normal shape. But on the other hand, the improvement in the forecastability of the two-time series was evident through the estimates of mean square errors of the last curve (**Table 2**).

#### **Figure 3.**

*The curves of the ordered pairs* **<sup>λ</sup>**^*i*,*Ju*ð Þ*<sup>λ</sup> of the transformed responses of the two time series data sets using BCT: (a) TSN. (b) TST.*

*A New Approach of Power Transformations in Functional Non-Parametric Temperature Time… DOI: http://dx.doi.org/10.5772/intechopen.105832*


#### **Table 1.**

*The data normality tests of the original and transformed responses in the two examples using FT model.*


**Table 2.**

*The MSE estimates of the last curve* **Xn** *of the two-time series datasets.*

## **6. Conclusions**

In the analysis of parametric and non-parametric time series, like any statistical modeling process that requires the availability of certain conditions so that the results of statistical inference are reliable, which contributes to improving the forecastability.

Data is rarely ready for statistical analysis, which necessitates the use of power transformation to improve the required output. In this chapter, power transformation has been used with a new methodology to improve the outputs of the analysis with the following three directions: time series, nonparametric estimation and functional analysis. Therefore, the authors faced the challenge of choosing the optimal power parameter estimation method in accordance with the conditions of the feasible solutions area for the three directions. Using MISE as a criterion for choosing the power parameter in the proposed method did not achieve the normality of the data but it enhanced the forcasetibility of the time series.

By applying the FT and BCT models, the first was applicable and fulfilled the concavity condition for the transformation effect measurement function *Ju*ð Þ*λ* , while the function curve was divergent at both ends of the power parameter range using the second model.

In the future, we recommend developing the proposed methodology using other transformation models and looking into the possibility of using it in other shapes of time series.

*Time Series Analysis - New Insights*

## **Author details**

Haithem Taha Mohammed Ali1,2 and Sameera Abdulsalam Othman<sup>3</sup> \*

1 Department of Economic Sciences, University of Zakho, Kurdistan Region, Iraq

2 Department of Economic, Nawroz University, Kurdistan Region, Iraq

3 Department of Mathematics, College of Basic Education, University of Duhok, Kurdistan Region, Iraq

\*Address all correspondence to: sameera.othman@uod.ac

© 2022 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

*A New Approach of Power Transformations in Functional Non-Parametric Temperature Time… DOI: http://dx.doi.org/10.5772/intechopen.105832*

## **References**

[1] Germán Aneiros-Pérez G, Cao R, Vilar-Fernández JM. Functional methods for time series prediction: A nonparametric approach. Journal of Forecasting. 2010;**30**(4):377-392. DOI: 10.1002/for.1169

[2] Kidziński Ł. Functional time series. preprint arXiv:1502.07113. [stat.ME] 2015. https://doi.org/10.48550/a rXiv.1502.07113

[3] Kannel PR, Lee S, Kanel SR, Khan SP. Chemometric application in classification and assessment of monitoring locations of an urban river system. Analytica Chimica Acta. 2007; **582**(2):390-399. DOI: 10.1016/j. aca.2006.09.006

[4] Dauxois J, Pousse A, Romain Y. Asymptotic theory for the principal component analysis of a vector random function: Some applications to statistical inference. Journal of Multivariate Analysis. 1982;**12**(1):136-154

[5] Ferraty F, Vieu P. Nonparametric models for functional data, with application in regression, time series prediction and curve discrimination. Nonparametric Statistics. 2004;**16**(1–2): 111-125

[6] Shang H, Xu R. Functional time series forecasting of extreme values. Communication Statistics. 2021;**7**(2): 182-199

[7] Maadooliat M, Huang JZ, Hu J. Integrating data transformation in principal components analysis. Journal of Computing Graphical Statistics. 2015; **24**(1):84-103. DOI: 10.1080/ 10618600.2014.891461

[8] Wand MP, Marron JS, Ruppert D. Transformations in density estimation. Journal of the American Statistical Association. 1991;**86**(414):343-353

[9] Ruppert D, Wand MP. Correcting for kurtosis in density estimation. Australian and New Zealand Journal of statistics. 1992;**34**(1):19-29

[10] Chavez-Demoulin V, Davison AC. Modelling time series extremes. REVSTAT-Statistical Journal. 2012; **10**(1):109-133

[11] Finney DJ. Statistical Method in Biological Assay, Charles Griffin. 1st ed. London: Charles Griffin &; Co. Ltd; 1952

[12] Box GEP, Cox DR. An Analysis of Transformations. Journal of the Royal Statistical Society. Series B (Methodological). 1964;**26**(2):211-252

[13] Tukey JW. On the comparative anatomy of transformations. The Annals of Mathematical Statistics. 1957;**28**: 602-632. DOI: 10.1214/aoms/1177706875

[14] Yang L, Marron JS. Iterated transformation–kernel density estimation. Journal of the American Statistical Association. 1999;**94**(446):580-589

[15] Bean A, Xinyi X, MacEachern S. Transformations and Bayesian density estimation. Electronic Journal of Statistics. 2016;**10**(2):3355-3373

[16] Pitt D, Guillen M, Bolancé C. Estimation of parametric and nonparametric models for univariate claim severity distributions: An approach using R. Journal of Financial Education. 2011;**42**(1–2):154-175

[17] Sakthivel KM, Rajitha CS. Kernel density estimation for claim size distributions using shifted power

transformation. International Journal of Science and Research. 2013;**6**(14): 2025-2028

[18] Bolance C, Guillen M, Perch Nielsen J. Kernel density estimation of ac tuarial loss functions. Insurance Mathe matics and Economics. 2013;**32**(1):19-36

[19] Koekemoer G, Swanepoel JWH. Transformation Kernel density estimation with applications. Journal of Computational and Graphical Statistics. 2008;**17**(3):750-769

[20] Bean A. Transformations and Bayesian Estimation of Skewed and Heavy-Tailed Densities. Ohio State University; Ohio LINK Electronic Theses and Dissertation Center 2017. 2017. http://rave.ohiolink.edu/etdc/view?acc\_ num=osu1503015935192212

[21] Othman SA, Ali HTM. Improvement of the nonparametric estimation of functional stationary time series using Yeo-Johnson transformation with application to temperature curves. Advances in Mathematical Physics. 2021; **2021**. DOI: 10.1155/2021/6676400

[22] Castro PE, Lawton WH, Sylvestre EA. Principal modes of variation for processes with continuous sample curves. Technometrics. 1986; **28**(4):329-337

[23] Ferraty F, Vieu P. Curves discrimination: A nonparametric functional approach. Computational Statistics & Data Analysis. 2003; **44**(1–2):161-173

[24] Ferraty F, Vieu P. Nonparametric Functional Data Analysis: Theory and Practice. Springer Science & Business Media; 2006

[25] Febrero-Bande M, de la Fuente MO. Statistical computing in functional data

analysis: The R package fda. usc. Journal of statistical Software. 2012;**51**(1):1-28

[26] Atkinson AB, Riani M, Corbellini A. The Box–Cox transformation: Review and extensions. Statistical Science. 2021

[27] Soleymani S. Exact Box-Cox Analysis. The University of Western Ontario; Electronic Thesis and Dissertation Repository, 2018. https://ir. lib.uwo.ca/etd/5308/

Section 6
