**1. Introduction**

With the development of computers, simulation programs are increasingly developed and become useful and indispensable tools for researchers and designers. It helps users to optimize design and system parameters without having to spend money to build experimental models and waste time to conduct experiments. Prophylactic programs in the field of solar water distillation are no exception. To run solar water distillation simulations, users need to provide weather data such as solar irradiance and ambient temperature measured in days, hours or smaller time periods. According to the requirements of the software. However, hourly weather data is not always available, especially in developing countries because measuring hourly weather data requires equipment, time and money. According to Duffie and Beckman [1], these weather data must be collected for at least 8 years to get the average value to remove the anomalies of the weather such as El Nino, La Nina phenomena, etc.

Another solution is to use typical meteorological year (TMY) data. In fact, the concept of TMY is derived from long-term weather data, which is determined in the correlation and statistical distribution to determine the characteristic indexes to produce the average value [2]. These data are then extracted from the selection criteria to produce month-by-month data from 23 years' data. TMY data were established for 26 Canadian sites, and were applied to the concept of a similar test reference year (TRY) for Europe [2]. While this approach reduces computational effort and the data base required to run simulations, the metric is also based on long-term data, something not available in most places in the world, especially in developing countries.

As pointed out by Nguyen and Hoang [3], the shortage of weather data, especially solar radiation data is very serious in developing countries. For example, in Vietnam, out of a total of 171 hydro-meteorological stations, only 12 have total solar radiation data, of which only 9 have continuous measurements. The remaining meteorological stations only record the number of hours of sunshine. Furthermore, radiation metrics are manually measured by humans every 3 hours instead of hourly. Therefore, hourly radiation data at hydro-meteorological stations in this country are not reliable enough to be used for simulation programs using solar energy systems.

There are two ways to solve the problem of lack of measurement data at the survey site: (i) using extrapolation to process data from hydrometeorological sites adjacent to similar climate features, and (ii) use aggregate generation to generate a series of weather data from the data requiring at least monthly averages. However, the first method can lead to large errors, moreover, very few developing countries have such data available [2]. Therefore, the following method has been studied and developed by many researchers.

Many researchers have proposed mathematical models to calculate the complete series of weather data. Fernandez-Peruchena et al. [4] and Boland [2] used numerical methods to probabilistically simulate daily and hourly solar irradiance data series. Brecl and Topic [5] used a similar approach to generate daily and hourly solar irradiance data from average daily irradiance values. Bright et al. [6] and Hofmann et al. [7] also apply statistical probabilistic techniques to generate a series of solar irradiance values per minute or every 5 minutes from hourly solar irradiance data. Soubdhan and Emilion [8] even used a random method to generate a sequence of solar radiation in seconds. Magnano et al. [9] applied the same technique to generate a synthetic sequence of half an hour's temperature. A common feature of the aforementioned studies is the use of a probability distribution function (PDF) of the data to normalize random variables to bring them to a Gaussian distribution [10].

#### *Generating Artificial Weather Data Sequences for Solar Distillation Numerical Simulations DOI: http://dx.doi.org/10.5772/intechopen.100930*

Gafurov et al. [11] another approach was to incorporate the spatial correlation of solar irradiance (SCSR) into random solar irradiance data generation models to generate monthly solar irradiance time series and daily.

Recently, several researchers have used different types of artificial neural networks (ANNs) to model the values of total solar irradiance on horizontal surfaces, such as [12–16]. Wu and Chan [17] used a new combined model of ARMA (Automatic Recovery and Moving Average) and TDNN (Time Delayed Neural Network) to predict hourly solar irradiance in Singapore. However, Mora-Lopez [10] pointed out that the limitation of these methods is that they are "black boxes" for outputting and analyzing averages of daily global irradiance, resulting in no important information can be obtained from these methods. Mora-Lopez et al. [18] proposed to use machine learning theory with a combination of probability finite automata (PFA) to calculate the values of total daily solar irradiance. The limitation of this method is that the use of PFA is complex and the method has not been shown to be universally applicable.

The results of the above review and analysis show that the stochastic methods are still globally applicable, simple and require minimal input data. Therefore, in this study, randomization technique was chosen to generate series of weather data, including solar irradiance and daily and hourly ambient temperature from monthly averages. These are important weather metrics for running the numerical simulations of solar distillation systems. First, a stochastic model is used to generate a composite of daily irradiance from monthly average daily solar irradiance values. The generated daily radiation sequences are then used to generate the hourly solar radiation sequences. Similarly, a model for generating hourly temperature series from monthly mean temperature values is also presented.

### **2. Stochastic model for daily radiation data generation**

#### **2.1 Model of generating daily solar radiation sequence**

When analyzing data of 300 months of solar radiation taken from 9 hydrometeorological stations with different weather characteristics, Aguiar et al. [19] discovered the analyzed solar irradiance values in For any time period, there is a probability distribution function that seems to be related to the monthly mean clearness index, *KT*, for that time period. Furthermore, they also found that the daily radiation value of any given day is statistically related to the value of the previous day. Based on this finding, Aguiar and colleagues built up 10 Markov matrices (called MTM library) from the data analysis of 300 months of solar radiation mentioned above. 10 subdivided matrices include: 1 matrix with *KT* ≤ 0.3 typical for months with very low direct radiation components; The next 8 matrices are for the months where *KT* varies from 0.3 to 0.7 with the increment of *KT* increasing by 0.05 for the next matrix; the final matrix with *KT* > 0.7 is for months with a high direct radiation component. The MTM library was then used to generate the daily radiation series from the average daily irradiance values for the locations in the United States for which the irradiance data were not used to generate the aforementioned MTM library. This simulation result was compared with the measured radiation results and compared with the simulation results from Graham's model [20]. When comparing statistical parameters such as mean, variance, and probability density functions as well as statistic characteristics (e.g. autocorrelation functions), Aguiar's model produced more accurate data series than with the Graham model. Furthermore, Aguiar's model is computationally simpler than Graham's model [21].

To calculate the KT values from the monthly mean *KT*, in this study, the Aguiar method was chosen for the following reasons:


#### **2.2 The data are used to evaluate the accuracy of the model**

Solar irradiance data were measured in 2 cities representing 2 tropical climates to evaluate the accuracy of daily solar radiation data generated from the calculated model selected in this study. They are Ho Chi Minh City representing the tropical forest climate (Aw) and Da Nang representing the tropical monsoon climate (Am). Pyranometers were used to measure total irradiance on a horizontal plane in the two cities every 5 minutes, measuring continuously from 5:30 a.m. to 6:30 p.m. Since these two cities have low latitudes (100 N and 16<sup>0</sup> N respectively), the day length does not change much during the year, so there is no need to extend the seasonal solar irradiance measurement period of the year [3]. Then, solar irradiance by hour, by day and by day of month average is calculated from the measured data. **Table 1** presents the average daily solar irradiance of the two cities mentioned above, used to run the program to generate date and time irradiance data in this study.

From the series of daily radiation values of 365 days of the year, the series of 365 values of the daily clearness index is calculated according to the following Eq. (1):

$$K\_{T \text{met.}} = \frac{H}{H\_0} \tag{1}$$

where H is the total daily solar radiation measured on a horizontal plane and H0 is the daily radiation outside the atmosphere, calculated by the equation:


**Table 1.**

*Total monthly average solar irradiance measured on a horizontal plane in Ho Chi Minh City and Da Nang (MJ/m<sup>2</sup> . day).*

*Generating Artificial Weather Data Sequences for Solar Distillation Numerical Simulations DOI: http://dx.doi.org/10.5772/intechopen.100930*

$$\begin{aligned} \mathbf{H}\_{0} &= \frac{24}{\pi} \, \mathbf{G}\_{\text{SC}} \mathbf{3} \mathbf{600} \left\{ \left[ \mathbf{1} + \mathbf{0}, \mathbf{0} \mathbf{3} \mathbf{3} \boldsymbol{\upphi} \cos \left( \frac{\mathbf{3} \mathbf{60} \mathbf{n}}{\mathbf{3} \mathbf{65}} \right) \right]\_{\text{\textdegree}} \\ &\times \left[ \cos \phi \cos \delta \sin \alpha\_{\text{\textdegree}} + \frac{\pi}{\mathbf{1} \mathbf{80}} \alpha\_{\text{\textdegree}} \sin \phi \sin \delta \right] \right\} \end{aligned} \tag{2}$$

with Gsc, n, ϕ, δ and ω<sup>s</sup> respectively are solar constance, day of the year, latitude of the investigated location, declination angle and sunset hour angle, defined in [1].

From the values of daily clearness index KT, the monthly average daily values of clearness index *KT* for 12 months of the year are calculated:

$$
\overline{\mathcal{K}\_{\rm T}} = \frac{\overline{\mathcal{H}}}{\overline{\mathcal{H}\_{\rm o}}} \tag{3}
$$

where the monthly average daily irradiance values *H* are taken in **Table 1** and monthly average daily irradiance values outside the atmosphere *H*<sup>0</sup> are calculated by Eq. (2) with day n being the average day of the month, given in [1]. **Table 2** presents the *KT* values of the 2 investigated cities.

#### **2.3 Applying Aguiar's model**

**Figure 1** presents the procedure for calculating the series of daily clearness index from the monthly average daily values.

After 365 values of the daily photometric index are calculated for each location, these value series are compared with the measurement series through statistical functions such as cumulative distribution function (CDF), density function, etc. probability (PDF). **Figures 2** and **3** present the cumulative distribution function of CDF of the calculated and measured KT in Ho Chi Minh City and Da Nang while **Figures 4** and 5 represent the probability density function PDF for these two cities. Statistical parameters including mean, median, minimum, maximum, standard deviation, mean absolute error (MAE) and mean square error (RMSE) were also compared between the KT series. Calculated and measured, as shown in **Tables 3** and **4**.

The results shown in **Figures 2**–**5** show that the Aguiar model produced a KT daily value series with an acceptable level of accuracy compared with the measured series. Similarly, the statistical parameters in **Tables 3** and **4** also show that the statistical error between the calculated and measured series is relatively small. Specifically, the mean and median error percentages of generated chains are 1% and � 4% for Ho Chi Minh City and 6% and 14% for Da Nang, respectively. Therefore, this model is expected to be able to be used to generate a series of daily cloud optical coefficients for any location because the Aguiar model has been proven to be universally applicable in the world [2, 4, 5, 19, 22]. As shown above, this model only requires input of 12 average daily solar irradiance values at the location to be calculated.


**Table 2.**

*Monthly average daily values of clearness index in Ho Chi Minh City and Da Nang.*

*Distillation Processes - From Solar and Membrane Distillation to Reactive Distillation…*

**Figure 1.** *The procedure for calculating the series of KT from KT:*

*Generating Artificial Weather Data Sequences for Solar Distillation Numerical Simulations DOI: http://dx.doi.org/10.5772/intechopen.100930*

**Figure 2.** *Cumulative distribution function of KT for Ho Chi Minh City.*

**Figure 3.** *Cumulative distribution function of KT for Da Nang.*
