**4. Discussion**

#### **4.1 Presence of auralized and visualized rooms**

Most of the results apply likewise to both egocentric distance and room size estimation. RQ 1 asked for the difference between the modalities as such. Mean estimates across the rooms based only on visual information significantly and considerably exceeded those based only on acoustic information, specifically by about a fourth of the mean physical property in the case of distance and by about a third in the case of size. Hence, H11 can be accepted and might be reformulated directionally (H11: μ<sup>A</sup> < μV) for future experimentation. Regarding egocentric distance estimation, the finding is plausible in principle given the reported compression of distance perception in real acoustic environments [27, 28, 31–33] and virtual acoustic environments [32, 34–36]. However, it does not agree with [36], who observed a compressed perception of visual distances between 1.5 and 5.0 m, or with [18], who used nearly the same auralization system in connection with smaller distances (1.93–5.88 m) and a restricted visualization. Though the general finding *D*^ *<sup>V</sup>* > *D*^ *<sup>A</sup>* does also not accord with the finding of [38] under the virtual environment condition (*D*^ *<sup>V</sup>* < *D*^ *<sup>A</sup>*), the exceptional observation at the smallest physical distance (*D* = 7.19 m, room JC) does. This is likely to be due to the same physical distance used in [38], indicating that the general finding might be confined to physical distances greater than about 8 m. However, checking the acceptance of inference from virtual rooms to real rooms for the music content by multiplying the mean estimates of the present study by the reality-to-virtuality ratios (*RVR*s) of the mean estimates of [38] (*RVR*distance,A = 1.071, *RVR*distance,V = 1.318; *RVR*size,A = 1.019, *RVR*size,V = 1.236) allows the findings *D*^ *<sup>V</sup>* > *D*^ *<sup>A</sup>* and ^ *SV* > ^ *SA* to be transferred from virtual scenes to corresponding real scenes without the persistence of the aforesaid scene-specific exception.

#### **4.2 Basic mode of perception**

Regarding RQ 2, there is evidence that the basic mode of perception (processing of single- vs. multi-domain stimuli) as such alters perceptual estimates of geometric dimensions in virtual rooms. Mean estimates based on acoustic-visual stimuli did not equal the average of the mean estimates based on either only acoustic or only visual stimuli. Rather, mean estimates of source distance under the acoustic-visual condition (with acoustic-visually congruent stimuli) were located at 85% (music) of the range between the mean estimates of the levels **A** and **V**, mean estimates of room size at 92% (music) and 94% (speech), indicating that under the multidomain condition visual information was weighted significantly higher than acoustic information. Though the distance estimation of the speech performance did not show a significant effect of perceptual mode, the mean estimates still accounted for 60% of the range between the mean estimates at levels **A** and **V**. When loading the mean estimates with the above-mentioned compensation factors, the percentages concerning music changed from 85% to 84% for source distance and from 92–86% for room size. Hence, the finding on RQ 2 may be transferred to reality in principle.
