**1.1 The Corona virus in detail**

The Corona virus, also named Covid-19, is a disease that is caused by the novel severe acute respiratory syndrome, and an infected patient shows well known symptoms such as fever, cough, and fatigue. Since the virus is able to spread via human-to-human contact from any person who is infected, the virus turned into a very catastrophic pandemic [1, 2].

The corona virus spread across the globe in less than six months from the first cases originating from the city of Wuhan in China. The World Health Organization has officially declared the corona virus as a global pandemic, and specific countries have regulated measures to reduce and overcome the severe effects of the virus.

### **1.2 Problem identification**

The aim of this research is to implement multiple machine learning and computational intelligence (CI) models that are able to predict the number of corona infections over a certain specified time interval. In basic ML, there are two main types of learning, supervised learning and unsupervised learning. Considering the prediction of the number of corona infections in the time interval February 2021– April 2021, after observing input data from the number of infections of time interval April 2020–January 2021, the problem can be framed as a supervised learning problem. In supervised learning, an algorithm learns to make associations, also called mappings, between certain input that in many cases originates from a dataset, and certain output. Each sample originating from a dataset has an input component (x), with a corresponding output component (y). Supervised learning also aims at approximating the real underlying mapping between those inputs and outputs [3, 4].

Since the goal is to predict future values on the number of corona virus infections, the problem can also be identified as a regression type problem. This is the case, because numerical values are predicted, and given a specific input, an output in the form of a function *<sup>f</sup>* : *<sup>ℝ</sup><sup>n</sup>* ! f g 1, … … *<sup>k</sup>* is produced. Regression type problems, can be both linear and nonlinear.

### **1.3 Introduction to time series data and time series forecasting**

Data that registers the spread and development of the corona pandemic, always takes a specific time point into account [5]. Also, all databases recording corona infections, hospitalization, and deaths always have counts of individuals taken at specific points in time. Therefore the underlying data can be considered as time series data.

Time series data can be defined as a one-dimensional time ordered sequence of values of a variable that has an attached time dependent component. It are considered measurements of any type that are observed sequentially over time or at regular time intervals [4, 6–9].

In many cases, a time series can be identified as a vector of type: *x*ð Þ<sup>1</sup> , *x*ð Þ<sup>2</sup> , *:* … , *x*ð Þ *<sup>n</sup>* , where each element *x*ð Þ*<sup>t</sup>* ∈*ℝ<sup>m</sup>* is an array of m values [5]. Time series data is always temporal data. This means that data is organized over time, with a time attribute being an index of the observations in the dataset [9]. Time series can be modeled in various domains, ranging from financial and stock market data, to weather and earthquake forecasting, as well as pandemics modeling and medicine intake [10].

*Time Series Forecasting on COVID-19 Data and Its Relevance to International Health… DOI: http://dx.doi.org/10.5772/intechopen.104920*

Time series forecasting is an area of research, that is aimed at the analysis of past observations of a random variable, to develop a model that best captures underlying relationships and patterns. It contains also the prediction of future values of a random variable [1], as accurate as possible, with data that has a time component attached. All information to make any forecast is available, including historical data and knowledge of any future events that might impact the forecasts.

A typical time series forecasting model can be formulated as: *Xt*þ<sup>1</sup> ¼ *f xt* ð Þ , *xt*þ1, … , *xt*�*n*þ<sup>1</sup> , where *xt* is the time series data. In time series data, every point *xt* can be formulated by: *xt* ¼ *ft* þ *st* þ *ct* þ *et*, which is the sum of the trend, seasonal, cyclical, and irregular components [11].

Time series forecasting can be divided into one-step forecasting, and multi-step forecasting. In one-step forecasting, the next time step is computed using the historical inputs. In multi-step forecasting, the forecast of the previous time-step is used as an input, and combined with the historical data produces the output of the next multiple time steps [4].

Forecasting is different than prediction, since forecasting considers a temporal dimension, which is always contained in any time series data. In such temporal dimension, future forecasts are always dependent on the current situation. This makes forecasting and modeling forecasts more difficult than predictive analysis [12].

Many people wrongfully assume that forecasting future values is not possible. In fact, there are computational intelligence models that can capture data patterns, and are able to make better forecasts than random guessing, and also show better performance than simple models that make average or naive forecasts. In such models, not the exact future is predicted, but it can be estimated from available real world data [13].

Time series forecasts can aid many professional in their area of work in guiding their future actions and decision making processes. For example, it can advice medical practitioners to determine the course of a treatment with a patient.

In academics, time series forecasting is considered to be one of the most profitable data mining methods, and a core skill in the data analytics field, but also a relative difficult one, and a relative unknown field of research [14–18].

The key challenge in time series datasets is the presence of time-dependent confounding variables. It is still a tremendous challenge and even an obstacle to adjust a time series model for these time-dependent confounding variables, and many forecasts made today still contain certain biases in their results [19].

### **2. Models for time series forecasting**

Despite its relative neglect compared to other research area's in artificial intelligence, there have been numerous and also different types of models developed for time series forecasting. Two groups of models that have recently and in the past been refined for time series data are the classical machine learning models and deep learning neural networks.

### **2.1 Baseline models**

Over the twentieth century simple and sometimes effective baseline models have been developed, that are able to make forecasts, somewhat better than random guessing. Two models that can efficiently and effectively be implemented as a baseline model in any time series modeling problem are:


### **2.2 Classical machine learning models**

Classical machine learning models also named statistical models, were originally developed from the 1960's and later decades for predictive analysis. These models use the variables historical past to predict future observations [9], and are linear. Some of these methods, like ARIMA and Exponential Smoothing, are still widely used. This is mainly because of their high accuracy, robustness, efficiency, and the fact that they can be used by non-experts in machine learning [22]. Most of these methods make use of a concept called lagged prediction. This means that for a prediction of time t, it relies on t-1 and so on, all the way until t-n. In other words, it relies on data points that are in the previous period of time [23].

### *2.2.1 Auto regressive model*

A statistical model where the value of interest is forecasted using a linear combination of past values of the variable, is called an autoregressive model [24]. Autoregression is a term that indicates that the predicted values are a regression of the current value of that variable against one or more prior values. The autoregressive (AR(p)) model is a stochastic model, that assumes some form of randomness in data. This means that future forecasts can be made with high accuracy, but not very close to being 100% accurate [9].

These models were developed from the concept that models are developed by regressing on previous values, also called lag terms [19]. An autoregressive model (AR(p)) can be formulated by:

$$y\_t = c + \phi\_1 y\_1 - \mathbf{1} + \phi\_2 y\_t - \mathbf{2} + \dots + \phi\_n y\_t - p + \epsilon\_t \tag{1}$$

where *ϵ<sup>t</sup>* is white noise or error term, and *ϕ*1, ……… *:*, *ϕ<sup>n</sup>* are parameters [6, 7].

### *2.2.2 Moving average model*

A moving average model is a model based on error lag regression. It is a stochastic process whose output values are linearly dependent on the weighted sum of a white noise error and the error term from previous time values [19, 20, 24].

A moving average model (MA(q)), builds a function of error terms of the past [11], and is basically the weighted sum of the current and past random errors [9]. A first order MA(q) model can be expressed by:

$$y\_t = c + \epsilon\_t + \theta\_{1c\_{t-1}} \tag{2}$$

A higher-order MA(q) model can be expressed by the following formula:

$$y\_t = c + c\_t + \theta\_{1c\_{t-1}} + \theta\_{2c\_{t-2}} + \theta\_{qc\_{t-q}} \tag{3}$$

*Time Series Forecasting on COVID-19 Data and Its Relevance to International Health… DOI: http://dx.doi.org/10.5772/intechopen.104920*

### *2.2.3 ARIMA*

A very popular and in many situations also an effective model in time series forecasting, is the ARIMA model, which is the acronym for AutoRegresive Integrated MovingAverage model. ARIMA models combine the AR and MA models, with an integrated (I-part) to an ARIMA model, that can make any data stationary by means of differencing [11, 24]. The model was orginally developed by the famous statisticians Box and Jenkins in 1968.

The purpose of an ARIMA model is to describe autocorrelations in time series data [6, 21].

An ARIMA model is a typical linear model, that assumes a linear correlation between the time-series values. It makes use of these linear dependencies to extract local patterns, and removes high-frequency noise from the underlying data [1, 16, 25].

ARIMA models have proven to be very accurate forecasting models for shortterm forecasting, when there is a scarcity of trainable data [26]. It is arguably one of the most popular and widely used linear models in time series forecasting, due to its great flexibility and performance [16].

Any ARIMA model has three main hyperparameters; p, d, and q. The p parameter stands for the number of lag observations, the d parameter defines the degree of differencing, and the q parameter describes the previous error terms used to predict the future value [8, 9, 20, 26, 27].

The values for the p, d, and q values can be determined after plotting the ACF and PACF plot. ARIMA models are relatively simple to construct, and often show better performance that more complex, structural models [28].

Eq. (4) below shows the mathematical ARIMA model, in the following formula:

$$X\_t = a + \beta\_1 X\_{t-1} + \beta\_2 X\_{t-2} + \dots + \beta\_p X\_{t-p} + \theta\_{1\varepsilon\_{t-1}} + \theta\_{2\varepsilon\_{t-2}} + \dots + \theta\_{q\varepsilon\_{t-q}} \tag{4}$$

where *α* is the intercept term, *β*<sup>1</sup> … .. *β<sup>p</sup>* are lag coefficients, *θ*<sup>1</sup> … … *θ<sup>q</sup>* are the moving average coefficients, and *ϵ<sup>t</sup>*�<sup>1</sup> … *ϵ<sup>t</sup>*�*<sup>q</sup>* are errors.

### *2.2.4 Exponential smoothing*

Exponential smoothing (ES) models are based on a description of trend and seasonality in the data, and the prediction is a weighted linear sum of recent past observations or lags [24]. In single exponential smoothing, there is a parameter ɑ, the smoothing factor, which is an exponentially decreasing weight decay factor of past observations [1, 4, 6, 7]. Exponential smoothing can be formulated and computed as follows:

$$\mathbf{s}\_{t} = \alpha \mathbf{x}\_{t} + (\mathbf{1} - \alpha)\mathbf{s}\_{t-1} = \mathbf{s}\_{t-1} + \alpha(\mathbf{x}\_{t} - \mathbf{s}\_{t-1}) \tag{5}$$

where t > 0, and where t > 0, and α is the smoothing factor, which can be set at any number ranging between 0 and 1 [23].

This means that for predictive purposes, the more recent observations have more weight in the computed predicted values, than the observations further away in the past [29]. Especially when the smoothing factor, α has a higher value close to 1. Any smoothing method on time series data will oftentimes yield sufficient performance with univariate data that contains low trend or seasonality [24]. It also requires only a low amount of computation power [23].

### *2.2.5 Holt-winters exponential smoothing*

Holt-Winters exponential smoothing is also called triple exponential smoothing. It is a smoothing method that is similar to exponential smoothing models, where the next time step is an exponentially weighted linear function of observations at prior time steps. It is a more advanced smoothing method, since it also takes trend and seasonality into account when making forecasts. Therefore, HWES is suitable for univariate time series with trend and also seasonal components [24], and often performs well.

### **2.3 Time series forecasting with neural networks**

A neural network can be thought of as a network of neurons which are organized in layers, weights that are added to some of the networks parameters, and an activation function that causes the network to converge towards minimizing or maximizing an objective [30].

An artificial neural network (ANN) has a data-driven approach, where training depends on the available data. Furthermore, ANN models do not make any assumptions about the statistical distribution of the underlying time series, and are able to perform consistently non-linear modeling [20].

The goal of any ANN is to optimize an algorithm towards an objective function. This optimization is the process of finding optimal values for parameters or function arguments that minimizes or maximizes that function [3].

ANN's are flexible and non-parametric methods, which can perform nonlinear mappings from data. They are able, similar to other machine learning methods, to generalize over data. This is a process called generalization, and is the ability of a machine learning algorithm to perform well on new and previously unseen inputs. The generalization error, also mentioned as the test error, is the expected value of the error on a new input. It can be estimated by measuring its performance on a test set of examples that were collected separately from the training set, by performance metrics. In the research the two performance metrics are the root mean squared error (RMSE), and the mean average error (MAE) [3].

Neural networks are stochastic by nature. This means that given the same model configuration and the same training set, a different internal set of weights will result each time the model is trained to a different performance [4].

Today, deep learning is centered around artificial neural networks, than can be defined as a non linear function from a set of input variables x to a set of output variables y, controlled by a vector w of adjustable parameters. These networks allow nonlinear relationships between the response variable and its predictors, and are able to overcome the challenges faced by linear statistical models [6, 20].

A typical neural network always contains an activation function, an optimization procedure, and a set of hyperparameters. Many different functions can serve as an activation function, because a neural network is able to approximate any continuous function that maps input values to output values [30]. Most commonly used activation functions are the Sigmoid, ReLu, LeakyReLu, Tanh, and Softmax functions. In the network during the optimization procedure, an optimization algorithm makes the network converge towards the best optimal solution, which is minimizing or maximizing the objective. This can be considered as finding the appropriate values of parameters θ1...θ<sup>n</sup> [6, 31].

Parameters are in any ANN the weights for each variable or feature in the ANN model. In many cases they are determined by the backpropagation algorithm and iterations made by the optimizing function [32]. Hyperparameters in a neural

network are settings whose values can be determined and manually modified from outside the algorithm itself, and that controls the capacity of a model [3, 32].
