**1. Introduction**

The importance of statistical modeling and forecasting of time series data, etc., cannot be overemphasized. The benefits ranges from easy interpretability arising from visualization of results to the removal of the mysticism factor for the layman. The word 'forecasting' has to do with predicting the future based on data from the past and present. This is regularly done by the analysis of trends.

A routine example might be the estimation of temperature trends for some specified future date. Compared to forecasting, prediction can be seen as a term which is more general.

Forecasting methods have been applied in different areas ranging from climatology, finance, foreign exchange, etc. This has been applied in different regions of the world for better prediction and simulation. The key distinction in Information and Communication Technology (ICT) is the fact that with this technology, we can make predictions and simulations from previously obtained data. This is true and can be applied for every area while paying attention to the rules that govern them.

In this study we will be applying some statistical methods which can be adopted for the forecasting of climatic (weather) parameters in different regions of the world.

It is important to note that the predictability of the atmosphere is not perfect, this brings into context the fact that although statistical methods are necessary, results obtained are not totally accurate which is why room for errors (uncertainties) are given, albeit, a trend can be observed [1]. Statistical methods have been applied in the study of different regions for example, Daniel S. Wilks in [1] buttressed on the use of these methods on the analyses of different regions that do not necessarily have the same climatic condition. This brings into context the fact that laws are true irrespective of the region, i.e. neglecting all other factors that have little contribution to weather, the same methods can be applied in different regions to yield accurate results.

Analysis of trends can be useful in depicting and predicting the changing patterns and erraticism of some climatic parameters. This analysis gives a proper knowledge about the changing conditions of the climate and its effects, by the evaluation of meteorological parameters.

A data scientist using any tool or software for modeling and forecasting is particularly interested in the progression of these parameters (meteorological) as a function of time(t) *f t*ð Þ. The designers of navigation or monitoring systems cannot trivialize the importance of forecasting as this is a very important part of their system. The spatial and temporal changes of atmospheric parameters calls for the adoption of this analysis to discern the effects of some meteorological parameters on some variables; for example, see [2].

A very popular software for any data scientist that is willing to understand the nitty-gritty of weather forecasting is Python Programming. This paper will explain in detail the setup processes for this to help the layman get started. A dataset of temperature trend in Calabar, Nigeria will be used at the end of this chapter to test the processes explained for better visualization.

The applicability of results from forecasting cannot be underestimated because this is great information for people that depend on weather conditions like farmers, surfers, and event planners, etc. The accurate prediction of atmospheric parameters can go a long way in positively affecting the financials of the informed, as money can be saved by avoiding unnecessary cost during trying times [3]. Natural disasters like Tsunami can be predicted with the correlation of meteorological parameters, harnessing information as explained previously and then incorporating this information through machine learning into the design of forecasting systems.

We delve deeper into a review of statistical methods like the M-K test and its different variations, the Angstrom-Prescott model for the estimation of solar radiation, linear regression techniques, with a deep look into multiple linear regression which will be applied in predicting refractivity after obtaining the coefficients of the variables. Results will be obtained and explained.

## **2. Review of statistical tests/methodology**

With the shift going on in the world of technology, the implementation of some time series forecasting methods will be explained as well as their python implementation techniques. We often use forecasting models on time series data for the estimation of future trends of meteorological parameters.

*The Role of Statistical Methods and Tools for Weather Forecasting and Modeling DOI: http://dx.doi.org/10.5772/intechopen.96854*

## **2.1 Statistical test for trend (Mann-Kendall trend test)**

One of the most important and widely applied test for trends involving time series is the Mann-Kendall trend test. It is mostly used for environmental and hydrological data. The test is non parametric and does not necessitate the data conforming to a particular distribution, similarly, the sensitivity of the test due to an inhomogeneous series resulting to abrupt breaks is very low [4]. The null hypothesis *Ho* which says that there is no monotonic trend in the series, is tested against the alternative hypothesis *H*<sup>1</sup> which says that there is a trend in the series. The test is applied to cases where a range of data *xi*is in agreement with the equation below;

$$\mathbf{x}\_{i} = f(\mathbf{t}\_{i}) + \mathbf{e}\_{i} \tag{1}$$

*f t*ð Þ*<sup>i</sup>* is a function of time and *ε<sup>i</sup>* are the range residuals with zero mean. The Mann-Kendall test statistic *S* is calculated using the formula

$$S = \sum\_{k=1}^{n-1} \sum\_{j=k=1}^{n} \text{sgn}\left(\boldsymbol{\omega}\_{j} - \boldsymbol{\omega}\_{k}\right) \tag{2}$$

where;

$$\text{sgn}\left(\boldsymbol{\omega}\_{j} - \boldsymbol{\omega}\_{k}\right) = \begin{cases} +1; & \text{if } \left(\boldsymbol{\omega}\_{j} - \boldsymbol{\omega}\_{k}\right) > \mathbf{0} \\ \text{ } \mathbf{0}; & \text{if } \left(\boldsymbol{\omega}\_{j} - \boldsymbol{\omega}\_{k}\right) = \mathbf{0} \\ -\mathbf{1}; & \text{if } \left(\boldsymbol{\omega}\_{j} - \boldsymbol{\omega}\_{k}\right) < \mathbf{0} \end{cases} \tag{3}$$

*n* in Eq. (2) is the number of data values in the studied series. The advantage of this test is that it can handle the situation where data values are incomplete with respect to the number of years or months, etc. [4]

In the case where *n* is greater than or equal to 10 (10 and above), we adopt the normal approximation (*Z*).

To find the variance of *S,* '*VAR(S)*', we compute Eq. (4) below.

$$VAR(S) = \frac{1}{18} \left[ n(n-1)(2n+5) - \sum\_{p=1}^{g} t\_p(t\_p - 1) \left( 2t\_p + 5 \right) \right] \tag{4}$$

From the equation, the number of data values is represented by *n*, the number of equal of tied groups is represented by *g*, and the number of data values in the *pth* group is represented by *tp*.

We now use the results from *VAR(S)* to find the test statistic *Z*

$$Z = \begin{cases} \frac{\mathcal{S} - \mathbf{1}}{\sqrt{VAR(\mathcal{S})}}; & \mathcal{S} > \mathbf{0} \\ 0; & \mathcal{S} = \mathbf{0} \\ \frac{\mathcal{S} + \mathbf{1}}{\sqrt{VAR(\mathcal{S})}}; & \mathcal{S} < \mathbf{0} \end{cases} \tag{5}$$

A decreasing trend can be discerned from results of Eq. (5) when the value of *Z* is negative and an increasing trend when *Z* is positive (**Table 1**).

The significance of an increasing or decreasing trend is observed when the p-value of the series is lower than the significance level ð Þ ∝ **,** in this case, we can say


#### **Table 1.**

*Significance level (*∝Þ *required for given numbers of data.*

there is a trend observed trend in the series [5]. The adoption of different significant levels with respect to the number of given data values n is given in **Table 1**.

The classification of this probability/significance level is important because results can be confused to be entirely true. We need to understand that the significance level of say 0.05, means that there is a 5% probability that a mistake will be made while rejecting the null hypothesis *Ho*. Similarly, a significance level of 0.01 means that there is a 1% probability that a mistake will be made while rejecting *Ho:*

#### **2.2 Regression analysis**

The two easiest ways to forecast time series data by observation are the simple regression and the moving average, they both depend on historical data. The former demands mere observation of the previous trend and drawing up an extrapolation from there; this can be somewhat less accurate. The moving average has been used for forecasting meteorological data like rainfall (See reference [6]). Analyzing with regression has to do with the relationship one variable which is dependent has with one or more independent variables. We use them to check for models showing the strength of relationship between the variables and any possible future relationships [1].

#### *2.2.1 Simple linear regression*

This regression variation is based on the assumption that the two variables (dependent and independent variable) show a linear relationship between the intercept and the slope, similarly, there is no residual error in this regression and the value is constant across all observations.

$$Y = \pm mX \pm c + e \tag{6}$$

*Y* is the dependent variable. *X* is the independent variable. *m* is the value of the slope. *c* is the intercept. *e* is the residual error. The regression is depicted by a straight line describing the Eq. (6) above (**Figure 1**).

#### *2.2.2 Multiple linear regression*

This model is similar to that of simple linear regression, but the only exception is that it has multiple independent variables, unlike that of simple linear regression which has just the one. This can be represented by Eq. (7);

$$Y = \pm m\_1 X\_1 \pm m\_2 X\_2 \pm m\_3 X\_3 \pm c + e \tag{7}$$

*The Role of Statistical Methods and Tools for Weather Forecasting and Modeling DOI: http://dx.doi.org/10.5772/intechopen.96854*

#### **Figure 1.**

*Schematic illustration of simple linear regression. The regression line, Y* ¼ �*mX* þ *c* þ *e, is chosen as the one minimizing some measure of the vertical differences (the residuals) between the points and the line. The residual e is the difference between the data point and the regression line.*

*Y* is the dependent variable.

*X*1, *X*2, *X*<sup>3</sup> are the independent variables.

*m*1, *m*2, *m*<sup>3</sup> are the values of the slopes.

*c* is the intercept.

*e* is the residual error.

One thing to note about multiple linear regression is that the independent variables must not be collinear, i.e., they do not have to have a high correlation coefficient between each other, else there will be difficulty in assessing the relationship between the dependent and independent variables.

We also need to take note that before multiple linear regression is performed on range of data values, a linear relationship must exist between each independent variable and the dependent variable. The amount of residual error must be almost constant at each point in the model. The multiple linear regression will be applied to study and predict refractivity trend in Calabar, Nigeria. This was done with the 'statsmodel' package in python programming and results have been displayed in section 2.5.

A perfect meteorological equation that this regression technique can be applied to is the refractivity equation recommended by the International Telecommunication Union (ITU) shown in Eq. (8);

$$N = 77.6 \frac{P}{T} + 3.73 \times 10^5 \frac{e}{T^2} (N - 
umits) \tag{8}$$

*P* is the Atmospheric Pressure (hPa).

*e* is the Atmospheric Vapor Pressure (hPa).

*T* is the Absolute Temperature (K).

Eq. (8) shows the relationship between refractivity (dependent variable) and meteorological parameters (ambient temperature, atmospheric pressure, and vapor pressure) which are all independent variables.

This has been applied in [7] modeling the meteorological parameters for the accurate determination of refractivity. These meteorological parameters (Ambient Temperature, Atmospheric Pressure and Relative Humidity) have been obtained from the Nigeria meteorological Agency (NiMet), Calabar.

Results have been presented in section 2.5. From Eq. (8), we obtain the atmospheric vapor pressure *e* from the relation;

*Weather Forecasting*

$$e = \frac{e\_\text{s}H}{100}(hPa) \tag{9}$$

*es* is the saturated vapor pressure (hPa) calculated from;

$$e\_t = 6.11 \exp\left(\frac{17.26(T - 273.16)}{T - 35.87}\right) (hPa) \tag{10}$$

## **2.3 Review of the application of simple linear regression analysis in climatology (the Angstrom-Prescott model)**

The linear regression technique can be applied to find the relationships between an independent variable and the dependent variable. We can see the explanation of this from Eq. (6).

One major example of the benefits of linear regression is the estimation of the Angstrom-Prescott coefficients of the Angstrom-Prescott model for a particular region as this relates to solar radiation. The Angstrom-Prescott model is given by [8];

$$\frac{H}{H\_0} = a + b\frac{n}{N} \tag{11}$$

where the monthly average daily extraterrestrial radiation is given by *H*0, *H* is the monthly average daily global radiation in Wh/m2 /day. *n* is the actual sunshine duration in a day for a particular region (hours), *N* is the monthly mean length of the day in hours. The Angstrom-Prescott empirical coefficients are given by *a* and *b*. The linear regression technique has been adopted by Srivastava and Pandey [8] to find by *a* and *b.* Comparing Eq. (6) to Eq. (9) we have that;

$$\begin{aligned} \frac{H}{H\_0} &= Y \, (\text{variable})\\ \frac{n}{N} &= X \, (\text{variable})\\ b &= m = slope\\ a &= c = Y \, \text{intercept} \end{aligned} \tag{12}$$

This shows that if we have the variables ' *<sup>H</sup> <sup>H</sup>*<sup>0</sup> and *<sup>n</sup> <sup>N</sup>*', we can get the values of *a* and *b,* from our Y intercept and slope respectively. Getting these constant values for specific regions will help us forecast future trends.

For better understanding, the extraterrestrial radiation *H*<sup>0</sup> is given by the equation [9];

$$\begin{aligned} H\_0 &= \frac{24 \times 3600 \times I\_{\text{SC}}}{\pi} \times \left[ 1 + 0.33 \cos \left( \frac{360 \times d}{365} \right) \right] \\ &\times \left[ \cos \phi \cos \delta \sin \phi + \frac{\pi \omega}{180} \sin \phi \sin \delta \right] \end{aligned} \tag{13}$$

Here, *ISC* is the solar constant with a value of 1367 W/m<sup>2</sup> , *d* represents the day of the year (from January 1st to December 31st); taking January 1st as 1 and December 31st as 365 or 366 (in the case of a leap year). The latitude of the study location, the declination angle and the sunset hour angle are represented by *ϕ*, *δ*, and *ω* respectively. *<sup>ω</sup>* <sup>¼</sup> cos �<sup>1</sup>ð Þ � tan *<sup>ϕ</sup>* tan *<sup>δ</sup>* . The declination angle can be obtained from [9].

$$\delta = 23.45 \sin \left[ 360 \left( \frac{284 + d}{365} \right) \right] \tag{14}$$

*The Role of Statistical Methods and Tools for Weather Forecasting and Modeling DOI: http://dx.doi.org/10.5772/intechopen.96854*

#### **Figure 2.**

*Yearly variation of declination angle δ with respect to the days of the year.*

The monthly mean length of the day (in hours) can be obtained from [9].

$$N = \frac{2\alpha}{15} \tag{15}$$

The above equations can be applied to estimate the coefficients using linear regression. By this we can use these coefficients to predict solar radiation for a given region.

We know that the declination angle ranges from �23*:*5≤*δ*≤ þ 23*:*5. From **Figure 2**, we can see that the declination angle is 0 ° C at the Verbal and Autumnal Equinox, while the angles are �23.5 and + 23.5 at the summer and winter solstice respectively. It is easy to see why this has a huge effect on the variation of Global solar radiation.

Klein in 1977 [10] recommended average days of the various months and corresponding angle of declination as in **Table 2**.

#### **2.4 Calculus in climatology**

Applying calculus in environmental science is important in predicting a lot of things. It can be applied to understand the impacts of parameters on the variations of other parameters that they relate to. It is important to know that calculus is the 'mathematical study continuous change'so this can be applied in climatology to discern the impacts of some parameters on the "continuous change" of others [11–13].

Writing the refractivity equation in terms of relative humidity *H*, by substituting (10) into (9), and the into (8), we have;

$$N = 77.6 \frac{P}{T} + 3.73 \times 10^5 \frac{6.11 \exp\left(\frac{17.26(T - 273.16)}{T - 35.87}\right) \times 0.01 H}{T^2} (N - \text{units})\tag{16}$$

Similarly, obtaining refractivity in terms of the saturated vapor pressure *es* using Eq. (8) and (9) gives;

$$N = 77.6 \frac{P}{T} + 3.73 \times 10^5 \frac{e\_s H}{100 T^2} (N - \text{units}) \tag{17}$$

Now applying partial differentials to the equations for refractivity; Eqs. (8), (16), and (17), we obtain partial differentials relating each parameter to refractivity;

$$\begin{aligned} \frac{\partial N}{\partial P} &= \frac{77.6}{T} \\\\ \frac{\partial N}{\partial T} &= -\left(77.6 \frac{P}{T^2} + 7.46 \times 10^5 \frac{\epsilon}{T^3}\right) \\\\ \frac{\partial N}{\partial H} &= \frac{22790.3 \exp\left[\frac{17.26(-273.16 + T)}{-35.87 + T}\right]}{T^2} \\\\ \frac{\partial N}{\partial \epsilon} &= \frac{3.73 \times 10^5}{T^2} \\\\ \frac{\partial N}{\partial \epsilon\_i} &= \frac{3.73 \times 10^3 \times H}{T^2} \end{aligned} \tag{18}$$

From monthly Temperature, Humidity and Atmospheric pressure data obtained for 2005–2018 from the archives of the Nigerian meteorological agency (NiMet) Calabar, the atmospheric vapor pressure and the saturated vapor pressure can be obtained by applying these parameters in Eqs. (9) and (10) (**Figure 3**).

#### **2.5 Python implementation for Mann-Kendall trend test**

With the python software installed, the next step will be installing an IDE (integrated development environment). The easiest IDE to use is the Jupyter Notebook. This IDE displays results as you code.

We will walk you through the processes for analyzing data by using the data for Calabar in the south of Nigeria, collected from the archives of the Nigeria meteorological agency (NiMet). Research has been done in this area in climatology [14–18], but with the application of python and the Mann-Kendall test can give more meaning to time series data.


#### **Table 2.**

*Recommended average days for various months and their corresponding declination angles [10].*

*The Role of Statistical Methods and Tools for Weather Forecasting and Modeling DOI: http://dx.doi.org/10.5772/intechopen.96854*

#### **Figure 3.**

*Map of study area showing Calabar as a coastal area (left) and the exact location of the Nigerian meteorological agency (NiMet) where the data was obtained (right).*

We need to install the python package for the Mann-Kendall test called '*pymannkendall*'. To install this package, the following python packages are required;


For handling and cleaning data we need the '*pandas*' package, and for data visualization we need the '*matp*lotlib' package.

We want to analyze maximum ambient temperature data for 20 years in Calabar.

In the Jupyter notebook, the first step will be to import the respective packages. We must also note that for our examples in the Appendices, we stored the excel file containing the data used for the analysis in the same folder as the python file for easy reference.

*Appendix A* shows the process of importing the installed packages required for the analysis into the workspace.

Before we perform the Mann-Kendall test, we need to import the excel file titled '*Temperature'* in which the table is stored, in a sheet name called '*MAX*'. See *Appendix B.*

*Appendix C* shows how the Mann-Kendall original test is performed after importing the packages and data. We assigned the name of the imported data file as '*Max*' and set the significance level ð Þ ∝ to the default 5% (0.05); this can be adjusted by the user to his/her preference. Results were obtained and displayed in *Appendix C*.

We now perform the seasonal M-K test for the dry season variation, we import the excel file titled '*Temperature',* the date column will be an index column. The sheet name of the excel file in which the data is stored is called 'dry'. This implementation can be seen from *Appendix D.*

*Appendix E* shows the seasonal M-K test python implementation for the dry season variation. By setting the significance level ð Þ ∝ to the default 5% (0.05), and the period to 4, which stands for the 4 months of the dry season in the study area (November to February), we have satisfied the criteria for the seasonal M-K test.

For the wet season variation, the excel file titled '*Temperature'* will be imported and the date will be an index column. The sheet name is called 'wet'. *Appendix F* shows the implementation code for this importation.

We can now perform the seasonal Mann-Kendall test on the wet season data. *Appendix G* shows this. The Seasonal Mann-Kendall test of the imported file we assigned the name 'wet' has been achieved by setting the significance levelð Þ ∝ to the default 5% (0.05); this can be adjusted by the user to his preference. We also set the period to 8, which stands for the 8 months of the wet season in the study area (March to October).

There are other variations of the Mann-Kendall test along with their python implementation [19]. These can be used depending on the data obtained and the aim of the test.


*The Role of Statistical Methods and Tools for Weather Forecasting and Modeling DOI: http://dx.doi.org/10.5772/intechopen.96854*
