**5. Conclusion**

proportions found in the present study cannot directly be compared to the weights determined in [79], which are based on mere distance-related cues, those interfering factors had to be experimentally dissociated and, where applicable, included in

There were some additional results on factors and measures which were not

a. Both egocentric distance and room size mean estimates, regardless of whether based on acoustic, visual or acoustic-visual stimuli, were obviously lower for speech than for music (though this was not hypothesized or tested, see 2.7). Hence, there is a reason for hypothesizing an influence of content type. This might be due to differences between music and speech regarding, e.g., the bandwidth and energy distribution of the frequency spectra carrying spatial information, perceptual filtering and processing, receptiveness, and/or experiential geometric situations (non-mediatized speech is normally

received from lower distances and within smaller rooms than non-mediatized

b. Both the non-significant interaction effect and the particular mean estimates in the experiment according to the conflicting stimulus paradigm indicated that acoustic-visual (mainly spatial) congruency of the stimulus properties did not lead to minimum, maximum or especially accurate mean estimates.

congruency might play a greater role by contrast with a greater range of the incongruencies (e.g., further-away sound sources) or a greater number of

c. Egocentric distance mean estimates were most accurate under the acousticvisual (music) and visual (speech) condition; the room size mean estimates, which were generally inaccurate, likely due to the lack of the visual rendering of the rooms' rear part, were most accurate under the acoustic condition. In

overestimations of the geometric properties (*D*^ /*<sup>D</sup>* 6¼ 1) under the acousticvisual condition, neither an *increasing* underestimation nor an *increasing*

d. Looking at the conflicting stimulus paradigm, the minimum and maximum mean estimates of both source distance and room size did not consistently correspond to the minimum and maximum physical distances and sizes. *Perceived source distance* and *perceived room size* were each influenced by the physical source distance, the physical room size and potentially other

e. Because mean estimates based on purely acoustic stimuli were generally

higher in low-absorbent than in high-absorbent rooms (cf. [18]), the range of mean estimates introduced by the factor *Domain* was also generally smaller in low-absorbent rooms. This caused the respective mean estimates under the

contrast to previous studies [32, 36], regardless of general under- or

overestimation was conspicuous, rather *D*^ /*D* ≈ const.

properties of the virtual scenes.

**118**

This observation is not apt to constitute a general hypothesis, since

incongruent properties (e.g., including incongruent content).

physical-perceptual models of auditory-visual distance perception.

*Advances in Fundamental and Applied Research on Spatial Audio*

**4.5 Additional observations**

explicitly asked for by the RQs:

music).

The influence of the presence as well as of the properties of acoustic and visual information on the perceived egocentric distance and room size was investigated applying both a co-presence and a conflicting stimulus paradigm. Constant music and speech renditions in six different rooms were presented using dynamic binaural synthesis and stereoscopic semi-panoramic video projection. Experimentation corroborated that perceptual mean estimates of geometric dimensions based on only visual information considerably exceeded those based on only acoustic information in general. However, the perceptual mode as such (single- vs. multi-domain stimuli) altered the perceptual estimates of geometric dimensions: Under the acoustic-visual condition with acoustic-visually congruent stimuli, the presence of visual geometric information was generally given more weight than the presence of acoustic information. While the egocentric distance estimation under the acousticvisual condition did not tend to be compressed for music, it did for speech. When only acoustic stimuli were available, the greater amount of acoustic information provided by low-absorbent rooms appeared to be perceptually exploited to improve the accuracy of room size perception. Within the multi-domain mode of perception involving 30 acoustic-visually incongruent and 6 congruent stimuli, auditory-visual estimation of geometric dimensions in rooms relied about nine-tenths on the variation of visual, about one-tenth on the variation of acoustic properties, and negligibly on the interaction of the variation of the particular properties. Both the auditory and the visual sensory systems contribute to the perception of geometric dimensions in a straightforward manner. The observation of generally lower estimates for speech than for music needs to be corroborated and clarified. Further experimentation dissociating the factors source distance, room size, and acoustic absorption (all else being equal) is needed to clarify their particular influence on auditory-visual distance and room size perception.
