**4. Findings**

In total 43,579 data entries are collected from July to December 2021; after cleaning, 43,434 data points were used. Reference values for carbon dioxide concentration are approximately 400 ppm, for PM2.5 dust 10 μg/m3 annual mean, and atmospheric pressure at sea-level is 1013.25 millibars (see, for example, [18–20]. From historical records, the diurnal temperature range of Singapore is 25 deg. C to 33 deg. C; relative humidity in the island nation ranges from 60–90% typically. From June to September, the climate of Singapore is influenced by the southwest monsoon, after which is the inter-monsoonal period of relatively weaker winds.

Data was analyzed using the Python libraries Sklearn and SHAP. Correlations were drawn based on a best-fit line graph between each respective microclimatic and biometric/well-being variable.

Random forest regression models were then performed. Random forest regression are supervised learning algorithms that use ensemble learning methods for regression. In turn, an ensemble learning method is a technique that combines predictions from multiple machine learning algorithms to make a more accurate prediction than a single model; in the case of the study reported, it used multiple regression models. Finally, outlier events were identified for subsequent investigation if necessary.

To measure the respective contributions of the various predictors (the microclimate variables) against the actual values, Shapley summary plots were generated from the training data. Shapley values can be thought of as the average of the

#### *The Life2 Well Project: Investigating the Relationship between Physiological Stress… DOI: http://dx.doi.org/10.5772/intechopen.107493*

marginal contributions across all possible permutations within a given model. Simply put, Shapley values decompose a prediction to show the impact of each feature, by showing how much each feature contributed to the overall predictions.

The sections below elaborate on the respective contributions of each of the microclimatic variables measured in this study to selected biometric indicators and indicators of well-being of interest. Within each section, a Shapley summary plot showing the contributions of the various microclimate variables to the given biometric/wellbeing indicator is presented.

In the Shapley plots, variables are ranked in descending order of importance, and the situation of each dot along the x-axis shows whether the effect of that value is associated with a higher or lower prediction. Simple exponential smoothing was used on the models. Each environmental feature had its smoothed counterpart with the format: "(feature)\_", which was derived from values of that feature from preceding hours, with older values being exponentially less important than the current value. For example: for f(t), we have f(t-1), f(t-2), … affecting it, but f(t-1) will be of more importance to f(t) than f(t-2), f(t-3),… exponentially.

The features "0", "1", … "23" represent the hours of the day that data points were recorded, while "31" to "37" are days of the week (Monday to Sunday), which correspond to the time of recording of those data points. These features are binary, so 1 or high means true and 0 or low means false. For example, if a data point was recorded on 6 AM on a Monday then "6" and "31" would be 1 and the remaining binary features would be 0. Finally, the color shows whether that variable is high (red) or low (blue) for that observation. In this way, Shapley summary plots combine feature importance with feature effects.
