*3.1.6 Sphere regions*

The decomposition of the analysis in sphere regions depends on the context. As such, there exists no one ideal decomposition scheme. To support the case study presented in the next section, the sphere will be split into 6 regions: *front-up* (*x*>0 and *z*>0), *front-down* (*x*>0 and *z*<0), *back-up* (*x*< 0 and *z*> 0), *back-down* (*x*<0 and *z*<0), *left* (*y* >0), and *right* (*y*<0). This scheme has been chosen to best highlight region specific behaviours while remaining manageable, based on a preliminary analysis of the experiments studied in Section 4. The redundant *left* and *right* regions have been added for systematic checks on lateralisation discrepancies in participant responses.

#### **3.2 Methodology**

The methodology is proposed as a set of analysis steps, each building on the previous one to provide a comprehensive assessment of participants localisation performance.

#### *3.2.1 Evaluation task characterisation*

The first step of the analysis is to assess how much of the space, *i.e.* sphere, has been tested during the localisation task. In addition to depicting the grid of tested positions, this step reports its space coverage statistics as defined in Section 3.1.2. This provides readers with a simple set of metrics that reflect the spatial thoroughness of the evaluation, a value they can use to qualify the study's conclusions as well as for inter-study comparisons.

Atypical evaluation grids and their potential impact on participant results should also be discussed here. An evaluation on frontal field positions alone is likely to result in better overall performance compared to one encompassing the whole sphere, due to known variations of perceptual accuracy across sphere regions [5]. When using such grids, reporting metrics chance rates, *i.e.* their values for responses randomly distributed on the sphere, as proposed by Majdak et al. [14] can greatly help readers appreciate the presented results. Another problematic example is the use of evaluation grids sparse enough for participants to identify and recall the tested positions, likely impacting participants performance and associated conclusions.

Finally, the stimulus characteristics (type, duration, *etc.*) as well as the reporting method should be described and discussed here, so that any systematic bias they may have on participant responses can be detected during the analysis.

#### *3.2.2 Assess global extent of localisation error*

The objective here is to get a rough overview of participant performance during the localisation task, simply answering the question "how far were responses from the true target position?". The assessment is based on the great-circle error as defined in Section 3.1.3.

#### *3.2.3 Assess critical localisation confusions*

The next step consists in separating small precision errors from critical confusions. The nature and types of confusions is characterised early on as they can have a critical impact on localisation performance, often far more detrimental than local localisation accuracy issues. This characterisation is performed using one of the classification methods defined in Section 3.1.3.

#### *3.2.4 Assess local extent of localisation error*

This next step takes a closer look at responses classified as precision errors, *i.e.* the non-confused responses, to examine the local localisation performance. The mean great-circle error and angular direction of responses classified as precision confusions is computed to analyse the extent of local errors. Note that this metric does not depend on the confusion classification method used, as precision errors are defined using the same criterion in both methods. Conclusions drawn from this local analysis should naturally be leveraged by the percentage of responses it encompasses.

#### *3.2.5 Horizontal and vertical decomposition of the localisation error*

Whether or not this step should be included in the analysis, and which metrics it should make use of, depends on the context of the study. An experiment focusing on perceptual ITD adjustment for example would likely make use of both local lateral error as well as lateral compression. A training program attempting to fine tune participant interpretation of monaural cues would on the other hand base its evaluation on the local polar error. For some studies, this decomposition will not make sense and should be avoided to limit Type I error inflation.

#### *3.2.6 Decompose the analysis across sphere regions*

This final step consists in repeating all of the above, decomposing the analysis based on target positions to assess how participants fared in specific regions of the sphere. Given the loss of statistical power and the additional clutter that this analysis represents, it only needs to apply to those studies interested in characterising spatial imbalances in performance. The decomposition can then be performed using either a sphere splitting scheme as the one described in Section 3.1.6, or on a pertarget position basis. For example, this approach can be used to support the design of HRTF learning programs that would focus dynamically on those regions/confusions that are the most problematic [9].

To further characterise local localisation behaviours, the analysis can be completed by evaluating average response positions and spherical response distributions. The former, computed by summing local great-circle error *vectors*, as discussed in Section 3.1.3, will help characterise variations of localisation accuracy across sphere regions [21]. The latter, characterised using Kent distributions (see Section 3.1.3), will provide the statistical framework to assess the significance of those variations.
