**1. Introduction**

If you reached this point, you are probably familiar with the concept of binaural rendering. You likely also know that it is used for producing spatial sound over headphones in most of today's personal mixed reality experiences. While conceptually sound, binaural rendering is subject to several limitations in practice, some of them leading users to perceive distorted versions of the encoded 3D scene. Those distortions range from slight localisation blur to critical scenarios where auditory events are perceived on the opposite hemisphere from their actual position. Researchers have been working on techniques to address this problem of binaural localisation accuracy for some time now. To establish the benefit of these techniques, they predominantly, and quite naturally, rely on localisation performance evaluations.

The problem that concerns us here is that there is no standard for said evaluation. As a consequence, fully appreciating the value of a technique often requires careful reading and interpretation of both protocols and associated results. This becomes truly problematic when comparing the results of several studies, where differences in protocol and evaluation metrics make for complicated analysis at best, simply impossible in some cases. Without inter-study comparison, it becomes hard to reach any conclusion on the overall and added value of an HRTF selection, synthesis, or learning method. The objective of this chapter is to lay the foundations of such a standard.

#### **1.1 Context**

One of the most frequent causes of auditory space distortion in binaural rendering is related to the use of *non-individual* Head Related Transfer Functions (HRTF)<sup>1</sup> . An HRTF is a collection of filter pairs that, applied to a mono signal, modify it so that it has the same characteristics as if it had physically been travelling from a specific point in space to our ears. The term HRTF refers to the set of filter pairs, each corresponding to a different source position, typically forming a sphere of fixed radius around the listener. When sound travels to our ears, the acoustic wave interactions with our morphology causes deformations in the perceived signal. From childhood, our brain learned to interpret these acoustic cues as different source positions. Since there exist many variations of ear, head, and torso shapes that each deform the sound differently, so too are there variations in HRTFs. While we are quite adept at sound localisation with our own ears and our own HRTF, the problem arises when we start using someone else's.

In practice, most users will end up experiencing binaural rendering using an HRTF that is not their own, as in the case of a non-individual HRTF, generally taken from an existing database. Presently, measuring an individual's HRTF most often requires specific equipment and access to an anechoic room. Methods exist to simulate an HRTF from geometrical head scans or morphological data, but they suffer the same drawbacks: the techniques are either too costly or burdensome to implement in practical scenarios, or they produce HRTFs that do not exactly match the individual users. As mentioned, using a non-individual HRTF, which the brain has not trained with, often results in distortions of the perceived auditory space. Researchers have been working on this issue, proposing new simulation methods, HRTF selection processes, and even HRTF training programs focused on the reduction of these distortions.

Naturally, all these lines of research end up using a localisation evaluation task to assess the benefit of new techniques. As mentioned above, there exists no standard method for this evaluation, hindering results appraisal and inter-study comparisons.

#### **1.2 Chapter scope and organisation**

The objective of this chapter is to outline a set of metrics and propose a methodology to assess localisation performance in the context of HRTF selection and training programs. While the tools proposed can be applied to other contexts, they were designed with HRTF training in mind as not only do they assess instantaneous performance but also performance *evolution*, adding another dimension to the analysis workflow.

Section 2 presents a state of the art of evaluation metrics used to assess localisation accuracy in previous studies. Section 3 introduces the proposed methodology and the set of metrics on which it is built. Section 4 is a case-study, using

<sup>1</sup> We use the term *individual* to identify the HRTF of the user, *individualised* or *personalised* to indicated an HRTF modified or selected to best accommodate the user, and *non-individual* or *non-individualised* to indicate an HRTF that has not been tailored to the user. A so-called *generic* or *dummy-head* HRTF are specific instances of non-individual HRTFs.

the methodology to re-analyse and compare the results of five contemporary experiments on HRTF learning. Section 4 concludes this chapter.
