**3.4 Model setup**

The current study compares the accuracy of two machine learning methods such as MARS and M5Tree, for the prediction of daily relative humidity using different input data combinations of precipitation, temperature, and relative humidity. These machine learning models were applied on three meteorological stations such as Khunjerab, Naltar, and Ziarat one by one. The flowchart of the current study is displayed in **Figure 2**. Each model was applied on these stations separately with different input data combinations for the prediction of relative humidity (RH). Ten input data combinations were developed for each meteorological station by each model to decide the best input data combination for the prediction of relative humidity. Initially, three preceding relative humidity (RH) input combinations such as (i) RHt-1, (ii) RHt-1 and RHt-2, and (iii) RHt-1, RHt-2, and RHt-3 were tried to both the models to predict current RH (RHt). After that, three precipitation (i.e. (i) Pt-1, (ii) Pt-1, Pt-2, (iii) Pt-1,Pt-2,Pt-3) and temperature inputs (i.e. (i) Tt-1, (ii) Tt-1,Tt-2, (iii) Tt-1,Tt-2,Tt-3) combinations were separately added to the best RH combination whereas in the last input combination (10th); best temperature and precipitation inputs were added together with the best RH input combination to see the combine effect of both parameters on model's accuracy in predicting relative humidity.

The current analysis involves daily data of precipitation, temperature, and relative humidity from 1995 to 2009. About 75% of input data i.e. from 1995 to 2006 was used for training whereas 25% of input data i.e. from 2007 to 2009 was used for testing in both machine learning models for prediction of relative humidity.

**Figure 2.** *Flowchart of the study.*

However, [8] used only two-year data i.e. 2008 to 2009 for training the LSTM model which might not be enough for reliable predictions.

#### **3.5 Models evaluation criteria**

The models' accuracy in relative humidity prediction against observed data was evaluated using the following statistics which are normally used in the related literature. The statistics include R<sup>2</sup> , RMSE, and MAE as shown in Eqs. (3)-(5).

$$R^2 = 1 - \frac{\frac{1}{n} \sum\_{i=1}^{n} \left( RH\_i - \overline{RH} \right)^2}{\frac{1}{n} \sum\_{i=1}^{n} \left( rh\_i - \overline{rh} \right)^2} \tag{3}$$

$$RMSE = \sqrt{\frac{\sum\_{i=1}^{N} (RH\_{iO} - RH\_{iM})^2}{N}} \tag{4}$$

$$MAE = \frac{\sum\_{i=1}^{N} |RH\_{iO} - RH\_{iM}|}{N} \tag{5}$$

Where *rh* indicates the observed mean relative humidity; *RH* is the mean of the predicted relative humidity *RHi*; *N* signifies the number of data points. Moreover, *RHiO* is observed relative humidity and *RHiM* is modeled relative humidity. Previous studies such as [56–61] suggested that a single statistical indicator cannot examine well the prediction accuracy of soft computing models. Therefore, the current study used three statistical indicators to judge the model prediction accuracy with confidence. When the error distributions of the models are normal and uniform in that case the use of error statistics such as RMSE and MAE is more suitable. For an ideal model, the values of RMSE and MAE should equal to 0, whereas, R<sup>2</sup> should equal to 1. The model having relatively small values of MAE and RMSE as compared to other models is considered the best model.
