*3.1.2 Protocol space coverage*

Space coverage is a set of metrics, *scangle* and *scshape*, designed to provide insight on the density of points tested during the localisation task, as well as on the homogeneity of their distribution on the sphere. *scangle* represents the density of the

<sup>2</sup> MATLAB auditory localisation evaluation toolbox: https://hal.archives-ouvertes.fr/hal-03265190.

*HRTF Performance Evaluation: Methodology and Metrics for Localisation Accuracy… DOI: http://dx.doi.org/10.5772/intechopen.104931*

**Figure 2.**

*Various test grids and associated space coverage statistics. (a) Homogeneous grid with large number of points, (b) homogeneous grid with small number of points, (c) non-homogeneous grid with small number of points, and (d) horizontal grid with small number of points.*

evaluated positions for a given test protocol. It is is computed based on the spherical Voronoi diagram built from the evaluated positions, as the average over the solid angles of its cells [34], accompanied, �, by its standard deviation. As illustrated in **Figure 2**, denser grids result in smaller *scangle*, with standard deviation decreasing for increasingly homogeneous distributions.

*scshape* is computed as the average over the shape indices of the cells of the Voronoi diagram, defined as:

$$\text{shape\\_index} = 4\pi \,\, \frac{cell\\_area}{\,(cell\\_perimeter)^2} \tag{1}$$

where the perimeter is computed as the sum of the great-circle values between the cell vertices, expressed in radians. The squared value of the perimeter, as well as a 4*π* normalisation factor, are used so that the final shape index value is defined in 0, 1 ½ �. Cells shaped as circles will have an index close to 1, whereas the index will decrease towards 0 as the cell grows into an elongated polygon. As illustrated in **Figure 2**, *scshape* is used in addition to *scangle* standard deviation to detect uneven evaluation grid distributions. Note that grid density has a negative impact on *scshape*: dropping from 0.91 to 0.84 for uniform grids of 20 and 80 points respectively [35].

## *3.1.3 Great circle error and angular direction*

The great-circle error is defined as the minimum arc between the response and the true target position. This metric provides an intuitive way to assess the local localisation accuracy as the spherical distance between the responses and the target. Given *xyztarget* and *xyzresponse* as the vectors in Cartesian coordinates of the target and response positions respectively, the great-circle error is defined in [0°:180°] as:

$$\text{great\\_circle\\_error} = \arctan\left(\frac{\left||\chi\chi z\_{\text{target}} \times \chi\chi z\_{\text{response}}\right||}{\chi\chi z\_{\text{target}} \cdot \chi\chi z\_{\text{response}}}\right||\right) \tag{2}$$

where smaller values correspond to better localisation performances.

The angular direction is coupled to the great circle to enable vector summation of target to response arcs on the sphere. The direction towards the right ear constitutes the positive pole in the interaural coordinate system. The angular direction may then be calculated from the interaural coordinates as:

$$\text{angular}\_{\text{dir.}} = \arctan\left(\frac{\cos\left(a\_{\text{rep}}\right)\sin\left(\beta\_{\text{rep}} - \beta\_{\text{target}}\right)}{\cos\left(a\_{\text{target}}\right)\sin\left(a\_{\text{rep}}\right) - \sin\left(a\_{\text{target}}\right)\cos\left(a\_{\text{rep}}\right)\cos\left(\beta\_{\text{rep}} - \beta\_{\text{target}}\right)}\right) \tag{3}$$

where *α* is the lateral angle and *β* is the polar angle.

### *3.1.4 Confusion classification*

As discussed in Section 2.2, confusion classification schemes are primarily designed to separate small localisation errors from larger errors caused by erroneous localisation behaviours typically observed in binaural localisation tasks. The scheme used in the methodology is designed around notions borrowed from both cone-of-confusion [8, 10, 16, 29] and sphere quadrant [12, 14] classifications. It separates responses into 4 categories: those near the target (*precision* errors), those opposite the target compared to the *YZ* plane (*front-back* errors), those within the target cone-of-confusion (*in-cone* errors), and the remainder (*off-cone* errors).

The classification is illustrated in **Figure 3a**. Responses within a 45° radius cone around the target are defined as precision errors. Responses within a 45° cone around the symmetrical of the target position regarding the *YZ* plane, not already classified as precision errors, are defined as front-back errors. Responses with a lateral angle within 45° of that of the target, not already classified as either precision or front-back confusions, are defined as in-cone errors. Remaining responses are defined as off-cone errors. **Figure 3b** and **c** schematically show several alternate approaches, evaluated before choosing the current method (discussed in more detail below).

The proposed 45° threshold value is somewhat arbitrary, based on a segmentation of localisation error distributions of responses from previous studies [8–10]. This value can be adapted depending on the context of the study and the nominal localisation accuracy expected. To improve understanding, the

#### **Figure 3.**

*Confusion type as a function of response position on the sphere, for a target at spherical coordinates (35°, 10°) and a listener facing X with his left ear pointing towards Y. (a) Proposed classification scheme, (b) classification used in Stitt et al. [10] based on polar angle only, and (c) attempt at solving pole compression issues of (b).*

*HRTF Performance Evaluation: Methodology and Metrics for Localisation Accuracy… DOI: http://dx.doi.org/10.5772/intechopen.104931*

evolution of confusion zones for a 20° threshold and various target position is illustrated in **Figure 4**. The sum of the four confusion category rates always sums to 100%.

The distinction between in- and off-cone confusions is inspired from the duplex theory [36, 37], separating responses based on whether they are caused by misinterpreting monaural cues (in-cone confusions) or binaural cues (off-cone confusions). The commonly cited front-back confusion category has been maintained, despite not having a clearly identified origin in signal symmetry, as it represents a behaviour frequently observed in localisation studies [38]. Other confusion categories have been considered for this scheme, such as up-down or combined up-down-front-back confusions. They have been discarded however, as their representative patterns were not prevalent in the ≈10000 participant responses analysed in Section 4 or the meta analysis on ≈80000 responses in free field by Best et al. [38].

Compared to traditional cone-of-confusion classifications defined using only polar angle [8, 10, 16, 29], the main drawback of the proposed scheme is that it is susceptible to ITD mismatch. By only looking at the difference in *polar* angle between target and response, these classifications are not impacted by participants misinterpreting the ITD of the target, focusing on monaural cues interpretation characterisation. As illustrated in **Figure 3b**, the problem of these classifications is that they have high rate of false error detection at the poles of the interaural coordinate system, were a small shift in response can be interpreted as *e.g.* a frontback confusion instead of a precision error.

An attempt was made to propose a new scheme, inspired by the one used in Stitt et al. [10], alleviating the pole issue by increasing the (polar) spread of the precision zone as targets near the poles, constraining said spread to always span 45 of greatcircle angle when projected on the sphere. As illustrated in **Figure 3c**, this constraint results in a undesirable warping of the precision error zone for targets within a certain lateral distance from the poles.

The solution proposed for studies needing a classification based on monaural cues interpretation alone is to extend the proposed scheme, artificially adjusting the lateral position of targets prior to the classification to discard errors related to ITD mismatch. This adjustment can be made on a per-participant/target basis, replacing the lateral angle of targets by the mean lateral angle of their associated responses prior to the classification. It can also be performed on a per-response basis by simply assuming that targets and responses always have the same lateral position. The case study of Section 4 uses the second, simple, non-adaptive form of the classification scheme.

#### **Figure 4.**

*Confusion type as a function of response position on the sphere for the proposed classification scheme with an angle threshold of 20 and a listener facing X with his left ear pointing towards Y. Target at spherical coordinates (a) (35°, 10°), (b) (70°, 40°), and (c) (80°, 10°).*

#### *3.1.5 Azimuth, elevation, lateral, and polar errors and biases*

Lateral and polar errors are defined as the absolute difference between target and response positions in interaural coordinates. They are used to project localisation errors onto spatial dimensions associated with separate cues in the HRTF, allowing for an analysis of their independent contribution to the overall performance. Both are defined in [0°:180°], where smaller values correspond to better localisation performances. In the methodology, lateral and polar errors will be evaluated only on responses classified as *precision* confusions, hence referred to as *local* lateral and polar errors. This limitation allows to avoid the discontinuities discussed in Section 2.1.2 as well as the hazardous interpretation of values compounding local errors and spatial confusions.

As mentioned in Section 2.1.3, compression at the poles will lead to artificially inflated polar errors for targets near the interaural axis. A weight, proportional to the target lateral position, can be applied to the polar error to compensate for the compression, defining the *polar error weighted* as:

$$\text{polar\\_error\\_weighted} = \text{polar\\_error} \ast \cos\left(a\_{\text{target}}\right) \tag{4}$$

This weight is designed so that, for a target and a response that share the same lateral angle, the polar error weighted is equal to the arc length (great-circle) that separates them, regardless of said lateral angle. Note that while lateral error is not impacted by pole compression, it 'folds' near the interaural axis: random responses will overall have a lower local lateral error for targets in this region. This is a valuable feature of the interaural system when assessing the symmetric contribution of binaural cues (ITD/ILD) to localisation error. It can nonetheless lead to artificially deflated lateral errors when used in a different context.

Azimuth and elevation errors are defined as the absolute difference between target and response positions in spherical coordinates. They correspond to a more traditional projection of spherical coordinates, more intuitive yet no longer guided by auditory cue separation. Like interaural errors, azimuth and elevation errors are defined [0°:180°] and will be used only for local precision evaluation. As for polar error, azimuth error compression near the poles can be compensated for, defining the *azimuth error weighted* as:

$$\text{azimuth\\_error\\_weighted} = \text{azimuth\\_error} \* \cos\left(\rho\_{target}\right) \tag{5}$$

In addition to absolute errors, *signed* lateral and elevation errors are used in the methodology. Mean signed errors, referred to as *biases*, are typically used to examine systematic rotational biases, induced for example by an offset between the tracking system used for measuring the HRTF and that used during the evaluation task, or reporting bias. As for absolute errors, usage of both metrics will be restricted to responses classified as precision confusions.

Finally, lateral and elevation *compression* errors are used to highlight space compression and dilation effects. *Lateral compression*, is defined as k*αtarget*k � k*αresponse*k, so that a positive error corresponds to a compression towards the median plane *ZX*. Respectively, a negative error corresponds to a dilation away from the median plane. Similarly, the *elevation compression* is defined as k*φtarget*k�k*φresponse*k, so that a positive error corresponds to a compression towards the horizontal plane *XY*. Respectively, a negative error corresponds to a dilation away from the horizontal plane. Compression errors are for example used to characterise a pointing bias caused by the reporting interface, or to detect lateral compressions

*HRTF Performance Evaluation: Methodology and Metrics for Localisation Accuracy… DOI: http://dx.doi.org/10.5772/intechopen.104931*

resulting from an ITD mismatch between the presented HRTF and that of the participants.
