**5. Discussion**

Based on the results of the correlation analysis (see **Figure 1**), month, day, WS, and P variables have the lowest correlation with the PV output power, whereas solar irradiations and Tm have the strongest correlation with the PV power. Furthermore, all of the variables have a negative correlation with RH parameter. As RH rises, the PV power decreases. Moreover, the relationship between Tamb, Hour, Eff, and PV output *Principal Component Analysis and Artificial Intelligence Approaches for Solar… DOI: http://dx.doi.org/10.5772/intechopen.102925*

#### **Figure 6.** *Residual boxplot.*

power appears to be neither strong nor weak. As a result, we simplified the PV power forecast method by removing the variables Month, Day, RH, P, and WS from the input data and keeping other variables as the main inputs to our regression models.

The PCA method showed three major factor components that influence PV power and reach up to 90.4% of the total variable variance. As a result, the PCA technique was used to identify the most significant variables, which are then used in the proposed models.

The results of performance metrics, on the other hand, in **Tables 5** and **6**, the CB technique provided the best balance between the forecasted and observed values, with an R2 = 98.21% in the testing phase and R2 = 99.14% in the training one. This is owing to the fact that linear models lose accuracy when the dependencies are not linear, as is the case with solar PV output. Moreover, by comparing the results obtained in the case of raw data and reduced data resulting from the PCA analysis, the results are clearly superior, demonstrating the critical importance of this dimensionality reduction approach, which allows for cost and efficiency savings.

Moreover, the **Figure 3** gives extra information on model efficiency in addition to the error metrics presented above. All observed points should, in theory, be close to the diagonal line, which is the case of the CB algorithm.

Finally, several plots have been presented above to help in the analysis of the predictive models in terms of residuals. From the plot of residual vs. observed values presented in **Figure 4**, the CB method obviously surpasses the MLR method in terms of prediction accuracy, since residuals in CB are more localized around the x-axis than in MLR.

In addition, compared to MLR, **Figure 5** shows that residuals in CB are more localized around zero. Furthermore, looking at the Residual boxplots in **Figure 6**, we can see that CB has the smallest number of residuals compared to MLR, which has a much larger range of residuals.

All the results obtained show the superiority of the CB algorithm in predicting the PV power compared to the classical approach MLR.
