**2. State of the art**

This section presents and discusses a variety of metrics and methods of analysis introduced in previous studies for the evaluation of auditory localisation performance, in the context of HRTF selection and learning. Further, it discusses what aspect of the data or human behaviour is highlighted by each metric.

### **2.1 Analysis based on angular distances**

The majority of the metrics used in the literature to assess localisation performance are derived from the angular distance from the source position to the participant's response. This section discusses the most common of these metrics, their interpretation, and limitations. It builds upon the work presented in Letowski and Letowski [1].

#### *2.1.1 Egocentric coordinate systems*

Many auditory localisation tasks have participants indicating perceived target locations *around them*. As such, egocentric coordinate systems are a logical choice for the assessment of pointing errors. The *spherical* coordinate system, illustrated in **Figure 1a**, uses axes of azimuth and elevation angles. As most researchers are familiar with this coordinate system, it provides an intuitive framework to view and present results.

Alternatively, the *interaural* coordinate system has been proposed to evaluate localisation results as a more natural representation of how sound is perceived. The lateral angle, referred to as the "binaural disparity cue" by Morimoto and Aokata [2], defines *cones-of-confusion* along which the binaural cues of Interaural Level Difference (ILD) and Interaural Time Difference (ITD) are approximately constant. A cone-of-confusion is a set of positions presenting binaural cue/localisation ambiguities, that listeners may not be able to differentiate unless provided with further spectral cues or head movement information [3]. While not truly 'cones',

#### **Figure 1.**

*(a) Spherical, and (b) interaural coordinate systems used in the methodology, for a source positioned at angles (55°, 46°) as defined in each coordinate system. Spherical azimuth angle θ is defined in [180°:180°], elevation angle φ in [90°:90°]. Interaural lateral angle α is defined in [90°:90°], polar angle β in [180°:180°]. The lateral angle used here is shifted by 90° compared to that originally defined by Morimoto and Aokata [2]. In both systems, listeners are facing X with their left ear pointing towards Y.*

#### *Advances in Fundamental and Applied Research on Spatial Audio*

these constant ILD or ITD surfaces generally define a circle when the radius is fixed (see [4] for more discussion on the variation with radius of these constant-value surfaces). To maintain accepted terminology in the field, each of these circles is termed a "cone-of-confusion". The polar angle, or "spectral cue", is primarily linked with the monaural spectral cues in the HRTF. This independence of binaural and monaural cues makes the interaural coordinate system a compelling choice when assessing localisation performance, particularly when monaural cues are of special interest as in HRTF selection and learning tasks.

Other conventions have been proposed, such as the *double-pole* [5] or *three-pole* [6] coordinate systems. These systems have been designed to circumvent compression issues impacting single-pole (spherical and interaural) coordinate systems, further discussed in Section 2.1.3. They can prove very helpful for some types of data presentation [5], yet can confuse the analysis as more than one coordinate vector can be assigned to any given point in space.

#### *2.1.2 Azimuth, elevation, lateral, and polar errors*

Regardless of the coordinate system used, angular errors can be calculated using either the *signed* or *absolute difference* between target and response coordinates. The signed error will give an indication on the "localisation bias" [5] where the absolute error, more often used in the literature [7–10], provides a measure of how close a response is to the target, regardless of error direction. Computing summary statistics from these values can be a first and straightforward step to characterise both the central tendency and dispersion, or "localisation blur" [11], of participant responses [1].

Care must be taken in calculating signed and absolute errors because of the discontinuities in the azimuth and polar angles of the spherical and interaural coordinate systems. If a source is close to the discontinuity and the response crosses it (*e.g.* 179° to 179°), the calculated error will be artificially large. Likewise, summary statistics such as mean or standard deviation should also be computed away from those discontinuities. Another problem that results from working with egocentric systems is that data distributions will be warped by the sphere curvature, requiring in theory to use circular statistics when comparing statistical distributions. As discussed in [1], linear statistics can however be used in practice if the directional judgements are relatively well concentrated around a central direction.

#### *2.1.3 Compensating for spatial compression*

Both the spherical and interaural coordinate systems introduce spatial compression at their poles. In the interaural coordinate system for example, the circumference of the cone-of-confusion at 80° lateral angle is much smaller than that of a cone at 0° lateral angle. Therefore, polar angle errors at the poles (near 90° lateral angle) are more exaggerated than near the median plane. The same problem impacts azimuth errors near the poles (near 90° elevation angle) for the spherical coordinate system.

Previous studies have sought to avoid the spatial compression problem altogether by limiting the analysis to targets away from the poles [12]. The downside of this method is that it limits the scope of the study's conclusions because a large region of space cannot be studied. Still others have proposed compensation schemes, using for example the lateral angle to weight the response contribution to the average polar error [13–15]. Carlile et al. [13] for example weighted polar response errors using the cosine of the target lateral angle, decreasing response contributions as targets moved towards the interaural axis. This method more

accurately reflects the arc length between the target and response locations on the circle, keeping in mind that this weighting does not take the lateral angle of the response into account.

#### *2.1.4 Using directional statistics to analyse sound localisation accuracy*

Due to the discontinuities and spatial compression in the angular metrics of the typical coordinate systems, some work has simply examined the distance between the participant responses and the true target positions to assess the extent of localisation error. The most basic method, the *great-circle error* used in several studies [9, 15, 16], is measured as the distance along the unit sphere between the response and target locations. The great-circle error is independent of the selected coordinate system, not affected by the issues related to discontinuity in the axes or spatial compression.

Great-circle error on its own does not provide information about the direction of the response. Paired with the *angular direction*, it becomes a vector that fully describes the difference between the response and target positions [1]. Similar to *bearing* used to navigate on the globe, angular direction is the angle between the vector of the target towards the positive pole and that of the target towards the response. This vector can be used to compute the mean position of the responses, or *centroid*, and perform directional or spherical statistics. Alternatively, the centroid of the response locations may be calculated by separately summing the x, y, and z coordinates of the responses and dividing by the resultant length [17, 18], though this method may experience some undesirable results for edge cases with widelyscattered locations on the sphere.

To perform statistical analyses of the localisation accuracy, the variance in the response locations must be quantified [19, 20]. Given the two-dimensionality of the data, previous work has used Kent distributions on a sphere [17, 21] to determine ellipses that portray the variance of the data along major and minor axes of the spread of the responses. With Kent distributions, circular statistical tests may be conducted to evaluate the significance of the distance between the centroid of the responses and the target location (such as the Rayleigh *z* test) or the differences between mean response locations for different conditions (such as the Watson twosample *U*<sup>2</sup> test) [22]. Alternatively, Wightman and Kistler [18] suggest the use of the "concentration parameter" *κ* to characterise the variance, or "dispersion", of the response locations on the sphere.

#### *2.1.5 Further high level metrics based on angular distances*

The *spherical correlation coefficient* has been used to provide an overall measure of the correlation between target and response positions [13, 17, 18]. As with standard correlation, the spherical correlation coefficient ranges from 1 to 1, where a value of 1 is obtained for two identical data sets, and a value of 1 is obtained for two sets that are reflections of one another. By construction, the spherical correlation coefficient is invariant for global rotations between the two sets.

Rather than looking at single or mean error values to assess localisation accuracy, Hofman et al. [23] and Trapeau et al. [24] studied the linear regression between targets and responses elevation angles. Termed "elevation gain", the slope of this regression provides a higher level metric that can be used to detect compression or dilation effects in participant responses. Van Wanrooij and Van Opstal [25] extended this technique, applying the regression on target versus response azimuth as well as elevation angles. To account for azimuthal dependence of the elevation gain, they also introduced the notion of "local elevation gain", averaging elevation gain values based on a sliding azimuthal window. This metric allows the assessment of how elevation compression and dilation effects impact different regions of the sphere.

#### **2.2 Analysis based on confusions classification**

#### *2.2.1 Confusions classification*

An analysis based on angular distances alone would fail to distinguish local accuracy misinterpretations from critical space confusions, where responses are often on the opposite hemisphere from target positions. These kinds of errors are very common in studies using non-individualised HRTFs [8, 10, 26, 27], though they also occur when listening with one's own ears or HRTF [5].

One of the simplest techniques is that used by Honda et al. [28], which defines a hit-miss criterion based on a threshold great-circle error value. Though intuitive, the method does not provide much information on the nature or potential origin of the confusions.

A slightly more elaborate form of confusion classification was used by Middlebrooks [12], which flags responses as confusions when they are in a different hemisphere than that of the target. To avoid reporting small local accuracy errors as confusions for targets near the hemispheres limits, only those responses with polar angle errors greater than 90° were considered when searching for confusions. The classification thus resulted in three types of "quadrant confusions": front-back, updown, and left-right. Majdak et al. [14] further improved the definition, introducing a weighting factor to compensate for polar angle compression near the interaural axis. A comparable strategy was adopted by Carlile et al. [13], excluding from confusion checks those targets too close to the interaural axis.

A parallel classification was proposed by Martin et al. [29], determining confusion types based on cone-of-confusion angle values rather than sphere quadrants. The classification was further refined by Yamagishi and Ozawa [30], Parseihian and Katz [8] and Zagala et al. [16], adding "precision" and "combined" confusions to the already existing confusion types. This classification is discussed in more detail in Section 3.1.4.

#### *2.2.2 Separating angular and confusions errors contributions*

Given the relatively high incidence of front-back confusions in non-individual HRTF localisation tasks, results often exhibit a bi-modal distribution [10]. Analyses applied to data that contain a large portion of front-back confusions will have large variance and potentially inaccurate averages. The other confusion types also have a similar, if somewhat less characteristic, impact on the data, artificially inflating localisation errors. As such, it is common practice to split the data to analyse confusions separately from *local* performance [1, 12, 14, 31]. A potential problem with this approach is that excluding data from an analysis may result in an unbalanced data set, which limits the use of classical repeated-measures statistics.

Another approach that preserves the sample size of the data consists of 'folding' the responses into the same subspace as that of the target prior to the analysis. This technique has only ever been applied to mirror front-back confusions [18], as it may only apply to very specific circumstances and tends to inflate the power of the resulting conclusions [1].

*HRTF Performance Evaluation: Methodology and Metrics for Localisation Accuracy… DOI: http://dx.doi.org/10.5772/intechopen.104931*
