**2.5 Stimuli**

As far as possible in a virtual environment, a maximum ecological validity of the stimuli was sought by selecting dedicated performance rooms, artistic content and professional music and speech performers.

Six performance rooms differing in volume (low, medium, high) and average acoustic absorption coefficient (low: αmean(Sabine) < 0.2; high: αmean(Sabine) ≥ 0.2) were selected. Taking into account good speech intelligibility and an accurate perceptibility of the physical room properties (e.g., the visibility of the ceiling height), optimum receiver positions were defined. Based on geometric measures acquired in situ, models of the interior spaces, including the source-receiver-arrangements, were built using the software *SketchUp* (by Google/Trimble) and the plugin *Volume Calculator* (by TGI). The volumes and surface areas of the rooms were then calculated. Standard acoustic measures were taken in situ, in dependence on DIN EN ISO 3382-1 [87]. To corroborate the rooms'selection according to the absorption criterion ex post, Sabine absorption coefficients were calculated from the reverberation times and the geometric properties [88]. The air absorption effect was included; attenuation coefficients were taken from [89]. **Table 2** presents geometric and material properties. Distances were measured directly (i.e., not necessarily in the horizontal plane) from the acoustic center of the central sound source to the

#### *The Influences of Hearing and Vision on Egocentric Distance and Room Size Perception… DOI: http://dx.doi.org/10.5772/intechopen.102810*

interaural center of the head and torso simulator; they all cover the extrapersonal space. Detailed acoustic measurement reports (research data) are available [90].

The artistic content comprised a musical work and a text, which were chosen to support the perceptibility of the specific room properties by featuring, e.g., impulsivity and sufficient pauses. Two-minute excerpts of Claude Debussy's String Quartet in g minor, op. 10, 2nd movement, and of Rainer Maria Rilke's 1st Duino Elegy were selected. The artistic renditions were audio recorded in the anechoic room of the Technische Universität Berlin.

The performances were presented in the Virtual Concert Hall at Technische Universität Berlin, providing virtual acoustic and visual 3D renditions in rooms. It was particularly designed to meet the methodological requirements (2.1, 2.3), and was completely based on directional binaural room impulse responses (BRIRs) and stereoscopic panoramic images acquired in situ by means of the head and torso simulator *FABIAN* [91, 92]. The stimulus reproduction applied dynamic binaural synthesis by means of an extraaural headset and a semi-panoramic active stereoscopic video projection featuring an effective physical resolution of 4812 1800 pixels (**Figure 1**).

The used BRIRs contained the fixed HRTFs of *FABIAN*, hence non-individual HRTFs with regard to the listeners. Experimentation showed that head tracking in connection with non-individual HRTFs improves externalization [93], virtually eliminates front/back confusion, and substantially reduces elevation errors [94]. The auralization system used for this study included head tracking with an angular resolution of 1° and an angular range of 80° which had to be proved sufficient [95, 96]. It also compensated for spectral coloration [97]. Experimentation also showed that non-individual HpTF compensation, as applied for the present study, outperforms individual HpTF compensation in the specific case of non-individual binaural recordings [98]. System latency was minimized to a level below the perceptual threshold [99]. Cross-fade artifacts were reduced by the applied rendering

**Figure 1.** *Participant in the Virtual Concert Hall (visual condition: KO).*

sample size had to be geared to the small 3 6 co-presence design. To statistically reveal a relatively small effect size (*f* = 0.15) at a type I error level of α = 0.05 and a test power of 1β = 0.95 while assuming a correlation amongst the repeated measurements of *r* = 0.6 and an optional nonsphericity correction of ε = 0.7, the minimum sample size per group accounted for *n* = 38. A total of 114 subjects being affine to music per self-report were initially recruited for the experiment. Subjects

• Hypoacusis; criterion: audiogram, hearing threshold >20 dB HL at either ear at any of seven tested frequency bands (125 to 8000 Hz), uncompensated by

• Vision deficits; criterion: self-reported deficits, uncompensated by visual aid

• Loss of stereopsis; criterion: unpassed contour stereopsis test using the shutter

(factor ≥ 5) with reference to visual geometric dimensions (14 subjects, most

The resultant valid net sample sizes accounted for *n* = 50 for the music group and for *n* = 38 for the speech group, comprising 32 female and 56 male voluntary nonexperts aged from 21 to 65 years. The frequencies of the participants within the age classes (20s, 30s, 40s, 50s, 60s) amount to *f*abs = {36; 24; 13; 10; 5}. Participants did

As far as possible in a virtual environment, a maximum ecological validity of the stimuli was sought by selecting dedicated performance rooms, artistic content and

Six performance rooms differing in volume (low, medium, high) and average acoustic absorption coefficient (low: αmean(Sabine) < 0.2; high: αmean(Sabine) ≥ 0.2) were selected. Taking into account good speech intelligibility and an accurate perceptibility of the physical room properties (e.g., the visibility of the ceiling height), optimum receiver positions were defined. Based on geometric measures acquired in situ, models of the interior spaces, including the source-receiver-arrangements, were built using the software *SketchUp* (by Google/Trimble) and the plugin *Volume Calculator* (by TGI). The volumes and surface areas of the rooms were then calculated. Standard acoustic measures were taken in situ, in dependence on DIN EN ISO 3382-1 [87]. To corroborate the rooms'selection according to the absorption criterion ex post, Sabine absorption coefficients were calculated from the reverberation times and the geometric properties [88]. The air absorption effect was included; attenuation coefficients were taken from [89]. **Table 2** presents geometric and material properties. Distances were measured directly (i.e., not necessarily in the horizontal plane) from the acoustic center of the central sound source to the

• Red and/or green color blindness; criterion: unpassed Ishihara tests for

• Technical incident; failure of saving response data (6 subjects).

• Subjectively untrue responses; criterion: implausible perceptual bias

were excluded in the following cases (multiple incidences possible):

*Advances in Fundamental and Applied Research on Spatial Audio*

protanomaly and deuteranomaly (3 subjects).

glasses of the projection system (4 subjects).

hearing aid (0 subjects).

frequent response: "0 m").

professional music and speech performers.

(0 subjects).

not receive incentives.

**2.5 Stimuli**

**104**

algorithm fwonder. The system also allowed for the adaption to the participants' individual ITDs [100].

technologies (stereoscopic projection, HMD, CAVE), stages of virtualization (mixed reality, virtual reality), target distances (personal space, action space), simulated objects/environments (simple graphic objects, shapes, persons in hallways), and measurement protocols (triangulated distance estimation, blind walking, visual alignment, verbal estimation) [107–113]. Few experiments investigated distances roughly similar to those used in the present study (about 7 to 16 m). While Willemsen and colleagues did not observe a significant effect of IPD individualization on distance judgments [114], a large variation of the stereo base (0 to 4 times the IPD) showed significant effects on both distance and size judgments: Greater stereo bases resulted in perceptually closer and smaller objects [115]. However, relevance for the descriptive measures, effect sizes and significances of the present study is given rather by the expected value and distribution of the IPD differences than by their individual values. Anthropometric data of the German resident population, from which the sample was drawn, state median IPDs of 61 mm (male persons) and 60 mm (female persons) within the age range of 18 to 65 years [116]. Since the values do nearly exactly meet the stereo base of the target acquisition

*The Influences of Hearing and Vision on Egocentric Distance and Room Size Perception…*

*DOI: http://dx.doi.org/10.5772/intechopen.102810*

(60 mm), a substantial collective perceptual bias is unlikely to occur.

The average scene luminance *L*<sup>v</sup> of the gray cards amounted to 0.82 cd/m2

tion for the Virtual Concert Hall was published separately [118].

considered in the discussion section.

**2.6 Procedure**

**107**

information regarding room acquisition, content production, and stimulus reproduc-

Each participant ran through the test procedure individually. The procedure lasted about 3 hours and 10 minutes, and comprised color vision and stereopsis

Since electronic media transform both the physical stimuli and their perception, the replacement of natural by mediatized stimuli for serious experimental purposes demands the knowledge of the perceptual influences of the applied mediatizing system, as also pointed out by [16, 21]. The rendering technique of the Virtual Concert Hall was shown to provide perceptually plausible auralizations [119]. Specifically, the Virtual Concert Hall at Technische Universität Berlin was subjected to a test of auditory-visual validation by comparing a real scene and the correspondent virtual scene [38]. Amongst others, it yielded nearly equal loudness judgments of the real and the virtual environment, whereas the virtual environment—apparently due to the dark surrounding—was perceived slightly brighter than the respective real environment. The virtualization also generally lowered the perceived source distance and the perceived size of a real room—mainly due to the visual rendering. The mere auditory underestimation of source distance and room size introduced by the virtualization amounted only to 6.6 and 1.9%, respectively. The biases are

. Detailed

Limitations of the visual rendering pertain to the field of view (161° 56°), which should at least not affect distance perception [59, 117]; the angular resolution (2.1 arcmin), which might affect distance perception [57]; the fixed single focal plane in stereoscopy providing an invariant accommodation cue, so that the connection between convergence and accommodation is suspended [45]; and an undersized luminance of the projection. Data projectors could not provide the luminance and the contrast of the real scenes, especially in connection with shutter glasses. Thus, the luminances of the scenes were fitted into the projectors' dynamic range while maintaining compressed relations of the luminances. Scene luminances were calculated from exposure time, aperture, and ISO arithmetic film speed of correctly exposed photographs of a centrally placed and vertically oriented 18% gray card according to the additive system of photographic exposure (APEX). The average loss of the luminance value *Bv* introduced by the projection and shutter glasses was 2.88.

The virtual environment did not provide auditory motion parallax cues by supporting lateral motion interactivity and rendering. This was due to limited insitu acquisition times in the performance rooms. It would have required measurements at several additional positions of the head and torso simulator, depending on the content-specific minimum audible BRIR grid [101, 102], and thus would have multiplied the expenditure of acquisition time beyond the rooms' availability. However, auditory motion parallax, describing the change in the angular direction of a distant sound source due to the movement of the listener, is assumed to be a supporting cue in absolute distance estimation [103] and known to be a cue in relative depth estimation [104]. Regarding a distance range within the personal space, it was demonstrated by means of a depth discrimination task, and under exclusion of all other distance cues, that auditory motion parallax is exploited by listeners allowing for the perception of distance differences of unknown acoustic stimuli [104]. The cue was shown to be effective for distances between 0.3 and 1.0 m and to be exploitable for lateral head movements within a range of 46 cm. The participants'sensitivity was highest during self-induced motion. Even sensitive subjects did not perceive distance differences corresponding to angular displacements below 3.2°. This value is higher than the minimum audible movement angles (MAMAs) found in previous research (see [105] for an overview). Regarding a distance range of 1 to 10 m, Rumukkainen and colleagues determined the selftranslation minimum audible angle (ST-MAA) to be 3.3° by means of 2AFC discrimination tasks without an external reference [106]. Taking into account the absence of external references in the present study and applying the ST-MAA to the nearest sound source used (7.19 m), a concertgoer would remain below perceptual threshold within a lateral moving range of 41.5 cm, which corresponds to 150% of a typical concert seat's width. Respective lateral movements are normally not observed amongst visitors of classical concerts. Since a relative lateral shift of the listener above the perceptual threshold is a precondition for yielding distance information from the auditory motion parallax cue by triangulation, we do expect neither an appreciable bias nor a deterioration of the accuracy of distance perception introduced by the absence of lateral motion interactivity and rendering.

As a result, the Virtual Concert Hall at Technische Universität Berlin provided almost all relevant auditory cues without major biases (rich-cue condition). Exceptions are the missing supports for (rarely performed and normally small) head orientations around the pitch and roll axes.

The sound pressure level of the virtual rendition was adjusted to the sound pressure level of a live rendition of a string quartet in a real room, which was recorded by the calibrated head and torso simulator. Accounting for the gain of the signal chain and the rooms' STI measures, the level of the scenes' average sound pressure level at the blocked ear canal was *L*<sup>p</sup> = 72.5 dB SPL for a selected *mezzoforte* passage. Likewise, the speech's sound pressure level was adapted to a rendition in a real room and averaged out at *L*<sup>p</sup> = 59.5 dB SPL for a moderate declamatory dynamics stage.

The acquisition of the visual rendering data applied a fixed stereo base, which does not necessarily accord with the participants' individual interpupillary distances (IPDs). Respective differences might potentially bias the individual distance and room size perception. To date, experimentation has shown inconsistent effects of the variation of IPD differences on distance perception (see [46] for a review). Most studies cannot be translated into the present study, since they investigated maximum target distances of 1 m and/or used simple numerically modeled objects/ environments. Moreover, results differ regarding the significance, the size and/or the direction of the effects. This is apparently due to different rendering

#### *The Influences of Hearing and Vision on Egocentric Distance and Room Size Perception… DOI: http://dx.doi.org/10.5772/intechopen.102810*

technologies (stereoscopic projection, HMD, CAVE), stages of virtualization (mixed reality, virtual reality), target distances (personal space, action space), simulated objects/environments (simple graphic objects, shapes, persons in hallways), and measurement protocols (triangulated distance estimation, blind walking, visual alignment, verbal estimation) [107–113]. Few experiments investigated distances roughly similar to those used in the present study (about 7 to 16 m). While Willemsen and colleagues did not observe a significant effect of IPD individualization on distance judgments [114], a large variation of the stereo base (0 to 4 times the IPD) showed significant effects on both distance and size judgments: Greater stereo bases resulted in perceptually closer and smaller objects [115]. However, relevance for the descriptive measures, effect sizes and significances of the present study is given rather by the expected value and distribution of the IPD differences than by their individual values. Anthropometric data of the German resident population, from which the sample was drawn, state median IPDs of 61 mm (male persons) and 60 mm (female persons) within the age range of 18 to 65 years [116]. Since the values do nearly exactly meet the stereo base of the target acquisition (60 mm), a substantial collective perceptual bias is unlikely to occur.

Limitations of the visual rendering pertain to the field of view (161° 56°), which should at least not affect distance perception [59, 117]; the angular resolution (2.1 arcmin), which might affect distance perception [57]; the fixed single focal plane in stereoscopy providing an invariant accommodation cue, so that the connection between convergence and accommodation is suspended [45]; and an undersized luminance of the projection. Data projectors could not provide the luminance and the contrast of the real scenes, especially in connection with shutter glasses. Thus, the luminances of the scenes were fitted into the projectors' dynamic range while maintaining compressed relations of the luminances. Scene luminances were calculated from exposure time, aperture, and ISO arithmetic film speed of correctly exposed photographs of a centrally placed and vertically oriented 18% gray card according to the additive system of photographic exposure (APEX). The average loss of the luminance value *Bv* introduced by the projection and shutter glasses was 2.88. The average scene luminance *L*<sup>v</sup> of the gray cards amounted to 0.82 cd/m2 . Detailed information regarding room acquisition, content production, and stimulus reproduction for the Virtual Concert Hall was published separately [118].

Since electronic media transform both the physical stimuli and their perception, the replacement of natural by mediatized stimuli for serious experimental purposes demands the knowledge of the perceptual influences of the applied mediatizing system, as also pointed out by [16, 21]. The rendering technique of the Virtual Concert Hall was shown to provide perceptually plausible auralizations [119]. Specifically, the Virtual Concert Hall at Technische Universität Berlin was subjected to a test of auditory-visual validation by comparing a real scene and the correspondent virtual scene [38]. Amongst others, it yielded nearly equal loudness judgments of the real and the virtual environment, whereas the virtual environment—apparently due to the dark surrounding—was perceived slightly brighter than the respective real environment. The virtualization also generally lowered the perceived source distance and the perceived size of a real room—mainly due to the visual rendering. The mere auditory underestimation of source distance and room size introduced by the virtualization amounted only to 6.6 and 1.9%, respectively. The biases are considered in the discussion section.

#### **2.6 Procedure**

Each participant ran through the test procedure individually. The procedure lasted about 3 hours and 10 minutes, and comprised color vision and stereopsis

algorithm fwonder. The system also allowed for the adaption to the participants'

*Advances in Fundamental and Applied Research on Spatial Audio*

The virtual environment did not provide auditory motion parallax cues by supporting lateral motion interactivity and rendering. This was due to limited insitu acquisition times in the performance rooms. It would have required measurements at several additional positions of the head and torso simulator, depending on the content-specific minimum audible BRIR grid [101, 102], and thus would have multiplied the expenditure of acquisition time beyond the rooms' availability. However, auditory motion parallax, describing the change in the angular direction of a distant sound source due to the movement of the listener, is assumed to be a supporting cue in absolute distance estimation [103] and known to be a cue in relative depth estimation [104]. Regarding a distance range within the personal space, it was demonstrated by means of a depth discrimination task, and under exclusion of all other distance cues, that auditory motion parallax is exploited by listeners allowing for the perception of distance differences of unknown acoustic stimuli [104]. The cue was shown to be effective for distances between 0.3 and 1.0 m and to be exploitable for lateral head movements within a range of 46 cm. The participants'sensitivity was highest during self-induced motion. Even sensitive subjects did not perceive distance differences corresponding to angular displacements below 3.2°. This value is higher than the minimum audible movement angles (MAMAs) found in previous research (see [105] for an overview). Regarding a distance range of 1 to 10 m, Rumukkainen and colleagues determined the selftranslation minimum audible angle (ST-MAA) to be 3.3° by means of 2AFC discrimination tasks without an external reference [106]. Taking into account the absence of external references in the present study and applying the ST-MAA to the nearest sound source used (7.19 m), a concertgoer would remain below perceptual threshold within a lateral moving range of 41.5 cm, which corresponds to 150% of a typical concert seat's width. Respective lateral movements are normally not observed amongst visitors of classical concerts. Since a relative lateral shift of the listener above the perceptual threshold is a precondition for yielding distance information from the auditory motion parallax cue by triangulation, we do expect neither an appreciable bias nor a deterioration of the accuracy of distance perception introduced by the absence of lateral motion interactivity and rendering.

As a result, the Virtual Concert Hall at Technische Universität Berlin provided almost all relevant auditory cues without major biases (rich-cue condition). Exceptions are the missing supports for (rarely performed and normally small) head

The sound pressure level of the virtual rendition was adjusted to the sound pressure level of a live rendition of a string quartet in a real room, which was recorded by the calibrated head and torso simulator. Accounting for the gain of the signal chain and the rooms' STI measures, the level of the scenes' average sound pressure level at the blocked ear canal was *L*<sup>p</sup> = 72.5 dB SPL for a selected *mezzoforte* passage. Likewise, the speech's sound pressure level was adapted to a rendition in a real room and averaged out at *L*<sup>p</sup> = 59.5 dB SPL for a moderate declamatory dynamics stage.

The acquisition of the visual rendering data applied a fixed stereo base, which does not necessarily accord with the participants' individual interpupillary distances (IPDs). Respective differences might potentially bias the individual distance and room size perception. To date, experimentation has shown inconsistent effects of the variation of IPD differences on distance perception (see [46] for a review). Most studies cannot be translated into the present study, since they investigated maximum target distances of 1 m and/or used simple numerically modeled objects/ environments. Moreover, results differ regarding the significance, the size and/or

the direction of the effects. This is apparently due to different rendering

orientations around the pitch and roll axes.

**106**

individual ITDs [100].

tests, audiometry, a socio-demographic questionnaire, a privacy agreement, the clarification of the questionnaire, the measurement of the individual inter-tragus distance (necessary for the technical adaption to the individuals' ITDs), cabling, a familiarization sequence, and the actual test runs, inclusive of self-imposed breaks.

#### **2.7 Data analysis**

Arithmetic means standard deviations (**Tables 11** and **12**) and standard errors were calculated for all combinations of factor levels. The means were plotted against the combinations. According to the test design (2.3), the co-presence paradigm required 3 � 6 repeated measures analyses of variance (rmANOVA), the conflicting stimulus paradigm 6 � 6 rmANOVA for either level of *Content*. *Content* was not regarded as a factor for analysis because it was not covered by the RQs, and the quantification of the proportions according to RQs 3–5 were to be made possible separately for both music and speech. Kolmogorov-Smirnov tests indicated that the assumption of normally distributed error components was met with the exceptions of source distance under the conditions speech **A0**-**V5** (KS-*Z* = 1.390, *p* = 0.042) and speech **A6**-**V3** (KS-*Z* = 1.442, *p* = 0.031), and of room size under the conditions speech **A0**-**V5** (KS-*Z* = 1.500, *p* = 0.022), speech **A5**-**V0** (KS-*Z* = 1.759, *p* = 0.004), and music **A4**-**V5** (KS-*Z* = 1.428, *p* = 0.034). The minor violations concerning 4.8% of the conditions were deemed tolerable because of the robustness of the rmANOVA. Mauchly's sphericity tests indicated a significant violation of the sphericity assumption in both the 3 � 6 and the 6 � 6 analyses, which was compensated for by correcting the degrees of freedom using Greenhouse-Geisser estimates. To answer RQs 1 and 2, an orthogonal set of planned main contrasts (reverse Helmert) was calculated: Simple contrast **V** vs. **A**; combined contrast **VA** vs. {**V**, **A**}. To allow different approaches to effect size comparison, partial eta squared η<sup>2</sup> P, classical eta squared η2, and generalized eta squared η<sup>2</sup> <sup>G</sup> [120, 121] were reported for the omnibus tests. Because of RQs 3–5, and taking advantage of the commensurability of the factors *Auralized room* and *Visualized room* of the conflicting stimulus design, the η<sup>2</sup> effect sizes were particularly reported as indicators for the proportional influence of the acoustic room properties, the visual room properties, and their interaction on the geometric features. To allow their direct comparison in a simplified manner, the net effect sizes (the proportions of the explained variance) given by η<sup>2</sup> X net ð Þ ¼ η2 <sup>X</sup>*=* η<sup>2</sup> <sup>A</sup> <sup>þ</sup> <sup>η</sup><sup>2</sup> <sup>V</sup> <sup>þ</sup> <sup>η</sup><sup>2</sup> A�V were also reported. Based on Cohen's *f* ([122], p. 281), which was calculated from η<sup>2</sup> ([123], p. 7), the effect sizes were classified as small, medium or large.

## **3. Results**

#### **3.1 Perceived source distance**

#### *3.1.1 Co-presence paradigm*

Source distance showed significant main and interaction effects of *Domain* and *Room* for both music (**Table 3**) and speech (**Table 4**). Effects were large for *Room* and medium size for *Domain* and *Domain* � *Room*. The mean distance estimates were generally lower for speech than for music, and the range of the mean estimates introduced by the factor *Domain* was lower for the low-absorbent (wet) and higher for the high-absorbent (dry) rooms, even though it was not hypothesized or tested (**Figures 2** and **3**; **Tables 11** and **12**).

Regarding RQ 1, a priori main contrasts indicate that the mean estimates at level **V** were considerably higher than those at level **A**. The mean differences account for

*Means (markers) and standard errors (bars) of* perceived source distance *D against factor levels of* ^ Room *and* Domain *for music. Horizontal lines indicate the particular physical source distance D within each room.*

**S. o. V.** *SS df***adj** *MS F p* **η<sup>2</sup>** *f* **η<sup>2</sup>**

*The Influences of Hearing and Vision on Egocentric Distance and Room Size Perception…*

*Results of the rmANOVA for* perceived source distance *D (music, co-presence paradigm).* ^

**S. o. V.** *SS df***adj** *MS F p* **η<sup>2</sup>** *f* **η<sup>2</sup>**

*Results of the rmANOVA for* perceived source distance *D (speech, co-presence paradigm).* ^

Error (*Domain*) 2016.285 87.325 23.090

*DOI: http://dx.doi.org/10.5772/intechopen.102810*

Error (*Room*) 1643.487 188.397 8.724

Error (*D. R.*) 3054.229 348.552 8.763

Error (*Domain*) 1038.621 62.284 16.676

Error (*Room*) 1520.729 125.315 12.135

Error (*D. R.*) 2183.952 211.245 10.338

**Table 3.**

**Table 4.**

*Domain* 1521.061 1.782 853.503 36.965 <0.001 0.086 0.306 0.122 0.430 >0.999

*Room* 4610.610 3.845 1199.166 137.464 <0.001 0.260 0.593 0.296 0.737 >0.999

*Domain Room* 597.939 7.113 84.059 9.593 <0.001 0.034 0.187 0.052 0.164 >0.999

*Domain* 655.466 1.683 389.381 23.350 <0.001 0.058 0.248 0.073 0.387 >0.999

*Room* 1712.901 3.387 505.745 41.676 <0.001 0.152 0.423 0.171 0.530 >0.999

*Domain Room* 639.600 5.709 112.027 10.836 <0.001 0.057 0.245 0.072 0.227 >0.999

<sup>P</sup> = 0.519, and for 2.38 m (speech),

**<sup>G</sup> η<sup>2</sup>**

**<sup>G</sup> η<sup>2</sup>**

**<sup>P</sup> 1-β**

**<sup>P</sup> 1-β**

<sup>P</sup> = 0.469. This is also consistent on a descriptive basis

2.95 m (music), *F*(1,49) = 52.910, *p* < 0.001, η<sup>2</sup>

*F*(1,49) = 32.712, *p* < 0.001, η<sup>2</sup>

*Bold labels indicate low-absorbent rooms.*

**Figure 2.**

**109**

*The Influences of Hearing and Vision on Egocentric Distance and Room Size Perception… DOI: http://dx.doi.org/10.5772/intechopen.102810*


#### **Table 3.**

tests, audiometry, a socio-demographic questionnaire, a privacy agreement, the clarification of the questionnaire, the measurement of the individual inter-tragus distance (necessary for the technical adaption to the individuals' ITDs), cabling, a familiarization sequence, and the actual test runs, inclusive of self-imposed breaks.

*Advances in Fundamental and Applied Research on Spatial Audio*

Arithmetic means standard deviations (**Tables 11** and **12**) and standard errors were calculated for all combinations of factor levels. The means were plotted against the combinations. According to the test design (2.3), the co-presence paradigm required 3 � 6 repeated measures analyses of variance (rmANOVA), the conflicting stimulus paradigm 6 � 6 rmANOVA for either level of *Content*. *Content* was not regarded as a factor for analysis because it was not covered by the RQs, and the quantification of the proportions according to RQs 3–5 were to be made possible separately for both music and speech. Kolmogorov-Smirnov tests indicated that the assumption of normally distributed error components was met with the exceptions of source distance under the conditions speech **A0**-**V5** (KS-*Z* = 1.390, *p* = 0.042) and speech **A6**-**V3** (KS-*Z* = 1.442, *p* = 0.031), and of room size under the conditions speech **A0**-**V5** (KS-*Z* = 1.500, *p* = 0.022), speech **A5**-**V0** (KS-*Z* = 1.759, *p* = 0.004), and music **A4**-**V5** (KS-*Z* = 1.428, *p* = 0.034). The minor violations concerning 4.8%

of the conditions were deemed tolerable because of the robustness of the

different approaches to effect size comparison, partial eta squared η<sup>2</sup>

net effect sizes (the proportions of the explained variance) given by η<sup>2</sup>

was calculated from η<sup>2</sup> ([123], p. 7), the effect sizes were classified as small,

squared η2, and generalized eta squared η<sup>2</sup>

η2 <sup>X</sup>*=* η<sup>2</sup>

**108**

<sup>A</sup> <sup>þ</sup> <sup>η</sup><sup>2</sup>

medium or large.

**3. Results**

<sup>V</sup> <sup>þ</sup> <sup>η</sup><sup>2</sup> A�V

**3.1 Perceived source distance**

(**Figures 2** and **3**; **Tables 11** and **12**).

*3.1.1 Co-presence paradigm*

rmANOVA. Mauchly's sphericity tests indicated a significant violation of the sphericity assumption in both the 3 � 6 and the 6 � 6 analyses, which was compensated for by correcting the degrees of freedom using Greenhouse-Geisser estimates. To answer RQs 1 and 2, an orthogonal set of planned main contrasts (reverse Helmert) was calculated: Simple contrast **V** vs. **A**; combined contrast **VA** vs. {**V**, **A**}. To allow

bus tests. Because of RQs 3–5, and taking advantage of the commensurability of the factors *Auralized room* and *Visualized room* of the conflicting stimulus design, the η<sup>2</sup> effect sizes were particularly reported as indicators for the proportional influence of the acoustic room properties, the visual room properties, and their interaction on the geometric features. To allow their direct comparison in a simplified manner, the

were also reported. Based on Cohen's *f* ([122], p. 281), which

Source distance showed significant main and interaction effects of *Domain* and *Room* for both music (**Table 3**) and speech (**Table 4**). Effects were large for *Room* and medium size for *Domain* and *Domain* � *Room*. The mean distance estimates were generally lower for speech than for music, and the range of the mean estimates introduced by the factor *Domain* was lower for the low-absorbent (wet) and higher for the high-absorbent (dry) rooms, even though it was not hypothesized or tested

P, classical eta

X net ð Þ ¼

<sup>G</sup> [120, 121] were reported for the omni-

**2.7 Data analysis**

*Results of the rmANOVA for* perceived source distance *D (music, co-presence paradigm).* ^


#### **Table 4.**

*Results of the rmANOVA for* perceived source distance *D (speech, co-presence paradigm).* ^

#### **Figure 2.**

*Means (markers) and standard errors (bars) of* perceived source distance *D against factor levels of* ^ Room *and* Domain *for music. Horizontal lines indicate the particular physical source distance D within each room. Bold labels indicate low-absorbent rooms.*

Regarding RQ 1, a priori main contrasts indicate that the mean estimates at level **V** were considerably higher than those at level **A**. The mean differences account for 2.95 m (music), *F*(1,49) = 52.910, *p* < 0.001, η<sup>2</sup> <sup>P</sup> = 0.519, and for 2.38 m (speech), *F*(1,49) = 32.712, *p* < 0.001, η<sup>2</sup> <sup>P</sup> = 0.469. This is also consistent on a descriptive basis

#### **Figure 3.**

*Means (markers) and standard errors (bars) of* perceived source distance *D against factor levels of* ^ Room *and* Domain *for speech. Horizontal lines indicate the particular physical source distance D within each room. Bold labels indicate low-absorbent rooms.*

across all rooms except JC, which involves the smallest physical distance (*s* = 7.19 m) and shows a lower mean estimate under the **V** than under the **A** condition for both music and speech. Looking at RQ 2, the mean estimates at level **AV** were higher than the average of the mean estimates at levels **A** and **V**. The mean differences accounted for 1.04 m in the music group (a priori main contrast), *F*(1,49) = 13.141, *p* = 0.001, η<sup>2</sup> <sup>P</sup> = 0.211, and 0.24 m in the speech group (contrast not significant). The **AV** mean estimates were located at 85% of the range between the mean estimates at levels **V** and **A** in the music group and at 60% in the speech group.

Looking at the accuracy of the estimates, the mean estimates differed from the mean physical source distance by �2.36 m (�22.7%) at level **A**, +0.59 m (+5.7%) at level **V**, and +0.16 m (+1.5%) at level **AV** in the music group, and by �3.02 m (�29.1%) at level **A**, �0.63 m (�6.1%) at level **V**, and �1.59 m (�15.3%) at level **AV** in the speech group. Overall, the physical distances were met best by the estimates at level **AV** in the music group, and by the estimates at level **V** in the speech group.

#### *3.1.2 Conflicting stimulus paradigm*

*Auralized room* and *Visualized room* showed significant main effects on source distance for both music (**Table 5**) and speech (**Table 6**), however, no significant interaction effect. Effects of *Auralized room* were of small size, whereas effects of *Visualized room* were classified as large. Regarding music, η<sup>2</sup> A net ð Þ= 7% of the proportion of the explained variance (see 2.7) arose from *Auralized room*, η<sup>2</sup> V net ð Þ= 91% from *Visualized room*. Under the speech condition, the proportions accounted for 11% (*Auralized room*) and 88% (*Visualized room*).

distance was 8.65 m. As a rule, the auralized room KE led to a maximal mean estimate and the auralized room RT to a minimal mean estimate within each visualized room. In turn, the visualized room KE led to a maximal mean estimate and the visualized room JC to a minimal mean estimate within each auralized room. The mean estimates do not indicate that acoustic-visual congruency as such yielded

room *and* Visualized room *for music. Dots within markers indicate acoustic-visual congruency.*

*Means (markers) and standard errors (bars) of* perceived source distance *D against factor levels of* ^ Auralized

**S. o. V.** *SS df***adj** *MS F p* **η<sup>2</sup>** *f* **η<sup>2</sup>**

*The Influences of Hearing and Vision on Egocentric Distance and Room Size Perception…*

Error (*A. room*) 1751.252 131.608 13.307

*DOI: http://dx.doi.org/10.5772/intechopen.102810*

Error (*V. room*) 2907.446 127.509 22.802

Error (*A. r. V. r.*) 4219.961 670.192 6.297

Error (*A. room*) 1493.259 61.672 24.213

Error (*V. room*) 2245.993 100.794 22.283

Error (*A. r. V. r.*) 2203.942 466.540 4.724

**Table 5.**

**Table 6.**

**Figure 4.**

**111**

*Auralized room* 469.724 5.000 93.945 13.143 <0.001 0.017 0.133 0.023 0.211 >0.999

*Visualized room* 6256.608 2.602 2404.324 105.444 <0.001 0.233 0.551 0.238 0.683 >0.999

*A. room V. room* 134.833 13.677 9.858 1.566 0.086 0.005 0.071 0.007 0.031 0.868

*Auralized room* 375.912 1.667 225.526 9.314 0.001 0.023 0.153 0.028 0.201 0.951

*Visualized room* 2936.460 2.724 1077.931 48.375 <0.001 0.178 0.465 0.183 0.567 >0.999

*A. room V. room* 31.620 12.609 2.508 0.531 0.902 0.002 0.044 0.002 0.014 0.317

*Results of the rmANOVA for* perceived source distance *D (speech, conflicting stimulus paradigm).* ^

*Results of the rmANOVA for* perceived source distance *D (music, conflicting stimulus paradigm).* ^

**S. o. V.** *SS df***adj** *MS F p* **η<sup>2</sup>** *f* **η<sup>2</sup>**

**<sup>G</sup> η<sup>2</sup>**

**<sup>G</sup> η<sup>2</sup>**

**<sup>P</sup> 1-β**

**<sup>P</sup> 1-β**

maximal, minimal or especially accurate mean distance estimates.

**Figures 4** and **5** show the generally lower mean distance estimates for the speech by trend. The figures also illustrate the ranges of the mean estimates. The average range of mean estimates caused by *Auralized room* was 1.69 m, while the range caused by *Visualized room* accounted for 5.74 m. The range of the physical source

*The Influences of Hearing and Vision on Egocentric Distance and Room Size Perception… DOI: http://dx.doi.org/10.5772/intechopen.102810*


#### **Table 5.**

*Results of the rmANOVA for* perceived source distance *D (music, conflicting stimulus paradigm).* ^


#### **Table 6.**

across all rooms except JC, which involves the smallest physical distance (*s* = 7.19 m) and shows a lower mean estimate under the **V** than under the **A** condition for both music and speech. Looking at RQ 2, the mean estimates at level **AV** were higher than

*and* Domain *for speech. Horizontal lines indicate the particular physical source distance D within each room.*

accounted for 1.04 m in the music group (a priori main contrast), *F*(1,49) = 13.141,

**AV** mean estimates were located at 85% of the range between the mean estimates at

Looking at the accuracy of the estimates, the mean estimates differed from the mean physical source distance by �2.36 m (�22.7%) at level **A**, +0.59 m (+5.7%) at level **V**, and +0.16 m (+1.5%) at level **AV** in the music group, and by �3.02 m (�29.1%) at level **A**, �0.63 m (�6.1%) at level **V**, and �1.59 m (�15.3%) at level **AV** in the speech group. Overall, the physical distances were met best by the estimates at level **AV** in the music group, and by the estimates at level **V** in the

*Auralized room* and *Visualized room* showed significant main effects on source distance for both music (**Table 5**) and speech (**Table 6**), however, no significant interaction effect. Effects of *Auralized room* were of small size, whereas effects of

from *Visualized room*. Under the speech condition, the proportions accounted for

**Figures 4** and **5** show the generally lower mean distance estimates for the speech by trend. The figures also illustrate the ranges of the mean estimates. The average range of mean estimates caused by *Auralized room* was 1.69 m, while the range caused by *Visualized room* accounted for 5.74 m. The range of the physical source

A net ð Þ= 7% of the pro-

^ Room

V net ð Þ= 91%

<sup>P</sup> = 0.211, and 0.24 m in the speech group (contrast not significant). The

the average of the mean estimates at levels **A** and **V**. The mean differences

*Means (markers) and standard errors (bars) of* perceived source distance *D against factor levels of*

*Advances in Fundamental and Applied Research on Spatial Audio*

levels **V** and **A** in the music group and at 60% in the speech group.

*Visualized room* were classified as large. Regarding music, η<sup>2</sup>

11% (*Auralized room*) and 88% (*Visualized room*).

portion of the explained variance (see 2.7) arose from *Auralized room*, η<sup>2</sup>

*p* = 0.001, η<sup>2</sup>

**Figure 3.**

speech group.

**110**

*3.1.2 Conflicting stimulus paradigm*

*Bold labels indicate low-absorbent rooms.*

*Results of the rmANOVA for* perceived source distance *D (speech, conflicting stimulus paradigm).* ^

#### **Figure 4.**

*Means (markers) and standard errors (bars) of* perceived source distance *D against factor levels of* ^ Auralized room *and* Visualized room *for music. Dots within markers indicate acoustic-visual congruency.*

distance was 8.65 m. As a rule, the auralized room KE led to a maximal mean estimate and the auralized room RT to a minimal mean estimate within each visualized room. In turn, the visualized room KE led to a maximal mean estimate and the visualized room JC to a minimal mean estimate within each auralized room. The mean estimates do not indicate that acoustic-visual congruency as such yielded maximal, minimal or especially accurate mean distance estimates.

#### **Figure 5.**

*Means (markers) and standard errors (bars) of* perceived source distance *D against factor levels of* ^ Auralized room *and* Visualized room *for speech. Dots within markers indicate acoustic-visual congruency.*

#### **3.2 Perceived room size**

#### *3.2.1 Co-presence paradigm*

Room size showed significant main and interaction effects of *Domain* and *Room* for both music (**Table 7**) and speech (**Table 8**). Effects were of large size for *Domain* (music) and *Room* and of medium size for *Domain* (speech) and *Domain*


#### **Table 7.**

*Results of the rmANOVA for* perceived room size ^ *S (music, co-presence paradigm).*


*Room*. The mean size estimates were slightly lower for speech than for music by

Domain *for music. Horizontal lines indicate the particular physical room size S of each room. Bold labels*

Domain *for music. Horizontal lines indicate the particular physical room size S of each room. Bold labels*

*The Influences of Hearing and Vision on Egocentric Distance and Room Size Perception…*

*DOI: http://dx.doi.org/10.5772/intechopen.102810*

that the mean estimates at level **AV** were higher than the average of the mean estimates at levels **A** and **V**. The mean differences accounted for 3.11 m (music),

Regarding RQ 1, a priori contrasts indicated that the mean estimates at level **V** were considerably higher than those at level **A**. The mean differences account for

<sup>P</sup> = 0.666, and for 6.71 m (speech),

*S against factor levels of* Room *and*

*S against factor levels of* Room *and*

<sup>P</sup> = 0.582. Looking at RQ 2, a priori contrasts showed

trend (**Figures 6** and **7**).

*indicate low-absorbent rooms.*

**Figure 7.**

**113**

**Figure 6.**

*indicate low-absorbent rooms.*

*F*(1,49) = 51.457, *p* < 0.001, η<sup>2</sup>

7.40 m (music), *F*(1,49) = 97.748, *p* < 0.001, η<sup>2</sup>

*Means (markers) and standard errors (bars) of* perceived room size ^

*Means (markers) and standard errors (bars) of* perceived room size ^

**Table 8.**

*Results of the rmANOVA for* perceived room size ^ *S (speech, co-presence paradigm).* *The Influences of Hearing and Vision on Egocentric Distance and Room Size Perception… DOI: http://dx.doi.org/10.5772/intechopen.102810*

#### **Figure 6.**

**3.2 Perceived room size**

**Figure 5.**

*3.2.1 Co-presence paradigm*

Room size showed significant main and interaction effects of *Domain* and *Room*

*Domain* 10148.965 1.651 6145.611 70.421 <0.001 0.115 0.361 0.180 0.590 >0.999

*Room* 28109.442 3.650 7701.183 226.890 <0.001 0.319 0.685 0.379 0.822 >0.999

*Domain Room* 3733.981 7.522 496.424 21.814 <0.001 0.042 0.210 0.075 0.308 >0.999

^ Auralized

**<sup>G</sup> η<sup>2</sup>**

**<sup>P</sup> 1-β**

**<sup>G</sup> η<sup>2</sup>**

**<sup>P</sup> 1-β**

for both music (**Table 7**) and speech (**Table 8**). Effects were of large size for *Domain* (music) and *Room* and of medium size for *Domain* (speech) and *Domain*

**S. o. V.** *SS df***adj** *MS F p* **η<sup>2</sup>** *f* **η<sup>2</sup>**

**S. o. V.** *SS df***adj** *MS F p* **η<sup>2</sup>** *f* **η<sup>2</sup>**

Error (*Domain*) 5700.224 55.992 101.803

Error (*Domain*) 7061.808 80.919 87.270

Error (*Room*) 6070.632 178.851 33.942

Error (*D. R.*) 8387.358 315.667 26.570

*Results of the rmANOVA for* perceived room size ^

Error (R*oom*) 5949.496 101.789 58.449

Error (*D. R.*) 5610.150 251.030 22.349

*Results of the rmANOVA for* perceived room size ^

**Table 8.**

**112**

**Table 7.**

*Domain* 6484.837 1.513 4285.200 42.093 <0.001 0.082 0.299 0.131 0.532 >0.999

*Room* 26573.103 2.994 8875.557 165.259 <0.001 0.337 0.713 0.383 0.817 >0.999

*Domain Room* 3000.770 6.785 442.292 19.791 <0.001 0.038 0.199 0.065 0.348 >0.999

*S (speech, co-presence paradigm).*

*S (music, co-presence paradigm).*

*Means (markers) and standard errors (bars) of* perceived source distance *D against factor levels of*

room *and* Visualized room *for speech. Dots within markers indicate acoustic-visual congruency.*

*Advances in Fundamental and Applied Research on Spatial Audio*

*Means (markers) and standard errors (bars) of* perceived room size ^ *S against factor levels of* Room *and* Domain *for music. Horizontal lines indicate the particular physical room size S of each room. Bold labels indicate low-absorbent rooms.*

#### **Figure 7.**

*Means (markers) and standard errors (bars) of* perceived room size ^ *S against factor levels of* Room *and* Domain *for music. Horizontal lines indicate the particular physical room size S of each room. Bold labels indicate low-absorbent rooms.*

*Room*. The mean size estimates were slightly lower for speech than for music by trend (**Figures 6** and **7**).

Regarding RQ 1, a priori contrasts indicated that the mean estimates at level **V** were considerably higher than those at level **A**. The mean differences account for 7.40 m (music), *F*(1,49) = 97.748, *p* < 0.001, η<sup>2</sup> <sup>P</sup> = 0.666, and for 6.71 m (speech), *F*(1,49) = 51.457, *p* < 0.001, η<sup>2</sup> <sup>P</sup> = 0.582. Looking at RQ 2, a priori contrasts showed that the mean estimates at level **AV** were higher than the average of the mean estimates at levels **A** and **V**. The mean differences accounted for 3.11 m (music),

*F*(1,49) = 32.124, *p* < 0.001, η<sup>2</sup> <sup>P</sup> = 0.396, and 2.99 m (speech), *F*(1,49) = 24.933, *p* < 0.001, η<sup>2</sup> <sup>P</sup> = 0.403. The **AV** estimates were located at 92% of the range between the mean estimates at levels **V** and **A** in the music group and at 94% in the speech group.

As with source distance, the range of the mean room size estimates introduced by the factor *Domain* was lower for the low-absorbent (wet) and higher for the high-absorbent (dry) rooms, even though this was not hypothesized or tested.

Accuracies were generally low regardless of the level of *Domain*. The mean room size estimates differed from the mean physical room size by �2.00 m (�10.0%) at level **A**, +5.47 m (+27.5%) at level **V**, and +4.86 m (+24.5%) at level **AV** in the music group, and by �3.08 m (�15.5%) at level **A**, +3.72 m (+18.7%) at level **V**, and +3.34 m (+16.8%) at level **AV** in the speech group. Overall, the physical sizes were generally best approximated by the estimates at level **A**. Specifically, in lowabsorbent rooms (KH, JC, KE) and the small dry room (RT), physical room sizes were best approximated by the estimates at level **A**, whereas in medium- and largesized dry rooms (KO, GH) they were best approximated by the estimates at levels **AV** and **V**.

#### *3.2.2 Conflicting stimulus paradigm*

*Auralized room* and *Visualized room* showed significant main effects on room size for both music (**Table 9**) and speech (**Table 10**), however, no significant interaction effect. Effects of *Auralized room* were of small size, whereas effects of *Visualized room* were classified as large. Regarding music, η<sup>2</sup> A net ð Þ= 9% of the proportion of the explained variance (see 2.7) arose from *Auralized room*, η<sup>2</sup> V net ð Þ= 90%


**Table 9.**

*Results of the rmANOVA for* perceived room size ^ *S (music, conflicting stimulus paradigm).*


from *Visualized room*. Under the speech condition, the proportions accounted for

*and* Visualized room *for speech. Dots within markers indicate acoustic-visual congruency.*

*S against factor levels of* Auralized room

*S against factor levels of* Auralized room

**Figures 8** and **9** show the generally lower mean room size estimates for the speech by trend. The figures also illustrate the ranges of the mean estimates. The average range of mean estimates caused by *Auralized room* was 4.54 m, the range caused by *Visualized room* accounted for 10.99 m. The range of the physical room size was 15.72 m. As a rule, the auralized room KE led to a maximal mean estimate and the auralized room RT to a minimal mean estimate within each visualized room. In turn, the visualized room KE led to a maximal mean estimate and the visualized room KH mostly to a minimal mean estimate within each auralized room. The mean

14% (*Auralized room*) and 85% (*Visualized room*).

*Means (markers) and standard errors (bars) of* perceived room size ^

*Means (markers) and standard errors (bars) of* perceived room size ^

*and* Visualized room *for music. Dots within markers indicate acoustic-visual congruency.*

*The Influences of Hearing and Vision on Egocentric Distance and Room Size Perception…*

*DOI: http://dx.doi.org/10.5772/intechopen.102810*

**Figure 8.**

**Figure 9.**

**115**

#### **Table 10.**

*Results of the rmANOVA for* perceived room size ^ *S (speech, conflicting stimulus paradigm).* *The Influences of Hearing and Vision on Egocentric Distance and Room Size Perception… DOI: http://dx.doi.org/10.5772/intechopen.102810*

#### **Figure 8.**

*F*(1,49) = 32.124, *p* < 0.001, η<sup>2</sup>

*Advances in Fundamental and Applied Research on Spatial Audio*

*3.2.2 Conflicting stimulus paradigm*

Error (*A. room*) 6193.742 100.329 61.734

Error (*V. room*) 14266.617 156.927 90.913

Error (*A. r.* � *V. r.*) 13678.450 588.172 23.256

*Results of the rmANOVA for* perceived room size ^

Error (*A. room*) 12205.465 53.513 228.084

Error (*V. room*) 15582.628 82.431 189.039

Error (*A. r.* � *V. r.*) 10382.097 278.982 37.214

*Results of the rmANOVA for* perceived room size ^

*p* < 0.001, η<sup>2</sup>

group.

**AV** and **V**.

**Table 9.**

**Table 10.**

**114**

<sup>P</sup> = 0.396, and 2.99 m (speech), *F*(1,49) = 24.933,

A net ð Þ= 9% of the pro-

**<sup>G</sup> η<sup>2</sup>**

**<sup>G</sup> η<sup>2</sup>**

**<sup>P</sup> 1-β**

V net ð Þ= 90%

**<sup>P</sup> 1-β**

<sup>P</sup> = 0.403. The **AV** estimates were located at 92% of the range between

the mean estimates at levels **V** and **A** in the music group and at 94% in the speech

As with source distance, the range of the mean room size estimates introduced by the factor *Domain* was lower for the low-absorbent (wet) and higher for the high-absorbent (dry) rooms, even though this was not hypothesized or tested.

Accuracies were generally low regardless of the level of *Domain*. The mean room size estimates differed from the mean physical room size by �2.00 m (�10.0%) at level **A**, +5.47 m (+27.5%) at level **V**, and +4.86 m (+24.5%) at level **AV** in the music

*Auralized room* and *Visualized room* showed significant main effects on room size

interaction effect. Effects of *Auralized room* were of small size, whereas effects of

*Auralized room* 3275.179 2.048 1599.570 25.911 <0.001 0.024 0.156 0.031 0.346 >0.999

*Visualized room* 32107.238 3.203 10025.415 110.275 <0.001 0.233 0.551 0.239 0.692 >0.999

*A. room* � *V. room* 375.257 12.004 31.262 1.344 0.189 0.003 0.052 0.004 0.027 0.754

*Auralized room* 3799.130 1.446 2626.800 11.517 <0.001 0.026 0.162 0.030 0.237 0.968

*Visualized room* 23087.978 2.228 10363.307 54.821 <0.001 0.155 0.429 0.160 0.597 >0.999

*A. room* � *V. room* 185.804 7.540 24.642 0.662 0.716 0.001 0.035 0.002 0.018 0.296

*S (music, conflicting stimulus paradigm).*

*S (speech, conflicting stimulus paradigm).*

group, and by �3.08 m (�15.5%) at level **A**, +3.72 m (+18.7%) at level **V**,

for both music (**Table 9**) and speech (**Table 10**), however, no significant

portion of the explained variance (see 2.7) arose from *Auralized room*, η<sup>2</sup>

**S. o. V.** *SS df***adj** *MS F p* **η<sup>2</sup>** *f* **η<sup>2</sup>**

**S. o. V.** *SS df***adj** *MS F p* **η<sup>2</sup>** *f* **η<sup>2</sup>**

*Visualized room* were classified as large. Regarding music, η<sup>2</sup>

and +3.34 m (+16.8%) at level **AV** in the speech group. Overall, the physical sizes were generally best approximated by the estimates at level **A**. Specifically, in lowabsorbent rooms (KH, JC, KE) and the small dry room (RT), physical room sizes were best approximated by the estimates at level **A**, whereas in medium- and largesized dry rooms (KO, GH) they were best approximated by the estimates at levels

*Means (markers) and standard errors (bars) of* perceived room size ^ *S against factor levels of* Auralized room *and* Visualized room *for music. Dots within markers indicate acoustic-visual congruency.*

#### **Figure 9.**

*Means (markers) and standard errors (bars) of* perceived room size ^ *S against factor levels of* Auralized room *and* Visualized room *for speech. Dots within markers indicate acoustic-visual congruency.*

from *Visualized room*. Under the speech condition, the proportions accounted for 14% (*Auralized room*) and 85% (*Visualized room*).

**Figures 8** and **9** show the generally lower mean room size estimates for the speech by trend. The figures also illustrate the ranges of the mean estimates. The average range of mean estimates caused by *Auralized room* was 4.54 m, the range caused by *Visualized room* accounted for 10.99 m. The range of the physical room size was 15.72 m. As a rule, the auralized room KE led to a maximal mean estimate and the auralized room RT to a minimal mean estimate within each visualized room. In turn, the visualized room KE led to a maximal mean estimate and the visualized room KH mostly to a minimal mean estimate within each auralized room. The mean estimates do not indicate that acoustic-visual congruency as such yielded maximal, minimal or especially accurate mean size estimates.

**4.3 Properties of auralized and visualized rooms**

*DOI: http://dx.doi.org/10.5772/intechopen.102810*

66% for factor *Auralized room* ([18], p. 392).

effects for the time being.

respectively [38].

**117**

music condition.

Considering the multi-domain mode of perception and applying the conflicting stimulus paradigm, the distance and size estimates depended significantly on both the acoustic and the visual properties of the stimuli (RQs 3 and 4). Generally, about 89% of the explained variance arose from the entire visual and 10% from the entire acoustic information provided by the virtual environment. For both egocentric distance and room size perception, acoustic information showed a slightly greater proportion of explained variance under the speech than under the

*The Influences of Hearing and Vision on Egocentric Distance and Room Size Perception…*

In accordance with the MLE modeling of auditory-visual integration in principle, the acoustic and visual proportions of the explained variance appear to vary strongly according to the availability and, respectively, the richness of the cues in the particular domains: A preliminary experiment under substantially restricted visualization conditions (reduced field of view, reduced spatial resolution, still photographs instead of moving pictures, no maximal acoustic-visual congruency due to visible loudspeakers as sound sources) and non-restricted auralization conditions (identical auralization system) yielded a reversed order of proportions of the explained variance (cf. 2.7), which amounted to 33% for factor *Visualized room*, and

Against the background of the prevalent term *auditory-visual interaction* (or similar) it is remarkable, that at least no *statistical* interaction effect of the acoustic and the visual stimulus properties on egocentric source distance and room size perception was found that was significant (1.3, RQ5). Looking at perceived geometric dimensions as supramodal unified features specifying spatial notions, both acoustic and visual properties, and therefore both the auditory and the visual modalities, appear to contribute (regardless of variable weights) directly to the values of these features, and no interaction (non-additive) effects appear to complicate this straightforward principle. Hence, the modeling of auditory-visual integration of distance and room size perception will not have to include non-additive

Since the involved modalities and the mode of perception were constant across all factor levels, it may be assumed that VR-induced biases apply likewise to all factor levels of the conflicting stimulus paradigm and their combinations. Hence, the findings on RQs 3 to 5, i.e., the inferential statistics and the η2-based proportional accounts for the estimates, may be transferred from virtuality to reality in principle. At the descriptive level, the estimates might again be compensated for virtualization by loading them with *RVR*distance,AV = 1.284 and *RVR*size,AV = 1.191,

Within the test design, the presence and properties of the acoustic and visual domains were varied to experimentally dissociate the auditory and the visual modalities. Because this variation was categorical, i.e., comprising the entire *environmental* conditions of the scenes instead of either mere *distance* or mere *room size* cues, the results may be transferred to the perceptual modalities hearing and vision as such—at least for closed spaces, and within the boundaries of generalization given by the content types, rooms, and samples. Auditory-visual distance perception may in principle be influenced not only by physical distance, but by any structural (room size, room shape) and material properties that affect those acoustic cues (1.2) that are also affected by physical distance (cf. [124]). Since the domain

**4.4 Complex independent variables and interfering factors**
