**1. Introduction**

Head-related transfer functions (HRTFs) describe the filtering of the acoustic field produced by a sound source arriving at the listener's ear. The filtering is the effect of the interaction of the sound field with the listener's anatomy and has various properties. First, the incoming sound wave arrives at the ipsilateral pinna, i.e., the ear closer to the sound source, and then at the contralateral ear, i.e., the ear away from the sound source. This time difference between ipsilateral and contralateral ear is usually described as the interaural time difference (ITD). Second, larger anatomical structures, i.e., torso, shoulders and head, affect frequencies up to 3 kHz in a comparatively trivial way. As the listener's torso and head shadow the sound wave arriving at the contralateral ear, interaural level differences (ILDs) arise. Third, the incoming sound is filtered in a complex way by the shape of the listener's pinnae. These monaural time-frequency-filtering effects become especially important for higher frequency regions (above approximately 4 kHz) and sound directions inducing the same ITDs and ILDs [1–6]. Humans have learned to interpret this acoustic filtering to span an auditory space as an internal model of

their natural environment [7]. Because the pinna shape is unique for every person, HRTFs are considered listener-specific [8–10], similar to a fingerprint [1–6]. With an individually fitted HRTF dataset, it is possible for a person to perceive sounds (in a virtual environment) via headphones as if the sounds would originate from their (physical) position around the listener.

Both interaural and monaural features for a single sound direction can be represented by a binaural HRTF pair [11]. In signal processing terms, a binaural HRTF pair can be described as

$$\begin{aligned} \text{HRTF}\_{L}(\mathbf{x}^\*, f, s) &= \frac{p\_L(\mathbf{x}^\*, f, s)}{p\_0(\mathbf{0}, f)} \\ \text{HRTF}\_{R}(\mathbf{x}^\*, f, s) &= \frac{p\_R(\mathbf{x}^\*, f, s)}{p\_0(\mathbf{0}, f)} \end{aligned} \tag{1}$$

where *pL* and *pR* describe the sound pressure at a position inside the left and right ear, respectively (typically the entrance of the left and right ear canal or a position close to the eardrum), **x**<sup>∗</sup> describes the sound-source position (i.e., distance and direction), *f* describes the frequency and *s* the listener's geometry, emphasising the listener-specificity of HRTFs. *p*<sup>0</sup> describes the reference sound pressure, which is usually the pressure measured at the position of the midpoint of right and left ear *without* the head being present.

There are several options to set a specific coordinate system to systematically describe directions for HRTFs. From the physical perspective, the *spherical* coordinate system is a natural choice; in that case, the origin of the system is placed inside the listener's head at the midpoint between left and right ear and the direction is described by azimuth and elevation angles, see **Figure 1a**. In this system, one can intuitively define the two main planes: The eye-level horizontal plane, i.e., all directions with the elevation angle of zero, and the median plane, i.e., all directions with the azimuth angle of zero. The eye-level horizontal plane is also called Frankfurt plane and can be anatomically defined as the plane connecting the lowest part of the listener's orbital cavity and the highest part of the bony ear canal (meatus acusticus externus osseus). This spherical coordinate system resembles a *geodesic* representation widely used in physics, with the poles located at the top and bottom. An alternative system that is more relevant from the auditory perspective is given by the *interaural-*

#### **Figure 1.**

*Coordinate systems typically used in the HRTF acquisition and representation. The dashed line represents the interaural axis, and the arrow represents the viewing direction. (a) Spherical coordinate system with the azimuth and elevation angles. (b) Simple interaural-polar coordinate system with the lateral and polar angles obtained by rotation the poles of the spherical system. (c) Modified interaural-polar coordinate system with the lateral and polar angles corresponding to the azimuth angle in the horizontal plane and the elevation angle in the median plane.*

### *Perspective Chapter: Modern Acquisition of Personalised Head-Related Transfer Functions… DOI: http://dx.doi.org/10.5772/intechopen.102908*

*polar* coordinate system. This system is shown in **Figure 1b** and can be constructed by rotating the poles of the spherical system to the interaural axis, i.e., the axis connecting the two ears. A sound direction is then described by the lateral angles (along the horizontal plane) and polar angles (along the median plane). The poles are then located on the left and right sides of the listener. This simple interaural-polar coordinate system was used in various psychoacoustic studies, e.g., [12, 13], and has the disadvantage that the lateral angle does not correspond to the azimuth angle. **Figure 1c** shows the *modified* version of the interaural-polar coordinate system, which does not have this disadvantage. Here, the sign of the lateral angle is flipped, i.e., in the coordinate system, the positive lateral angles are used for sounds located on the left side of the listener. This transformation to a left-handed coordinate system has the advantage of having the lateral angle corresponding to the azimuth angle for all sources placed in the horizontal plane, and the polar angle corresponding to the elevation angle for all sources placed in the median plane. Thus, the modified interaural-polar coordinate system offers a better link between the psychoacoustic research and audio engineering. In that system, the lateral angle ranges from 90° (right ear) over 0° (front) to 90° (left ear), and the polar angle ranges from 90° (bottom) over 0° (front) and 90° (up) to 180° (back) and 270° (bottom again).

The understanding of these coordinate systems is important because state-ofthe-art acquisitions and representations of HRTFs utilise those systems. For example, **Figure 2** shows HRTFs along the Frankfurt and the median plane. These various coordinate systems are used in HRTF visualisation, in various HRTF-related software packages such as the SOFA toolbox [15], and in auditory modelling, e.g., the Auditory Modelling Toolbox (AMT) [16, 17].

HRTF acquisition can be classified into three categories: acoustic measurement, numerical calculation, and personalisation [18].

The acoustic measurement is traditionally designed as the measurement of the impulse response between source and receiver in an anechoic or semianechoic chamber, describing the transmission path from a sound source to the ear [11, 19]. A comprehensive review of the established state-of-the-art acoustic techniques to measure HRTFs can be found in [20]. Thus, in this chapter, Section 3, we only briefly provide an overview of the traditional acoustic HRTF measurement approaches, highlight some of their differences and new trends and focus on the requirements for the acoustic measurement.

#### **Figure 2.**

*HRTF magnitude spectra for the listeners (a) NH236 and (b) NH257, both from the ARI database [14]. Top: Spectra along the median plane. Bottom: Spectra along the eye-level horizontal plane. 0 dB corresponds to the maximum magnitude in each panel.*

Numerical HRTF calculation simulates the acoustic measurement by considering a 3D representation of the listener's geometry and the positions of multiple external sound sources, for which the generated sound pressure at the entrance of the ear canal is calculated. This technique has become more popular and is the main focus of this chapter. To this end, in Section 4, we provide an overview of the principles of various numerical calculation approaches including a comparison of the mentioned methods.

Personalisation of HRTFs describes the process of adapting an existing set of generic data guided by listener-specific information, either with the help of objective or subjective personalisation method. The objective personalisation has been approached from two different domains: the geometric domain, in which listener-specific anthropometric data are measured and used to personalise a generic geometric model from which HRTFs are then simulated; or the spectral domain, in which a generic HRTF set is directly personalised based on listener-specific information. Examples for personalisation approaches include utilising frequency scaling [21], parametric modelling of peaks and notches [22], active shape modelling (ASM) [23], principal component analysis (PCA) in both geometric [24] and spectral domains [25–29], multiple regression analysis [30], independent component analysis (ICA) [31], large deformation diffeomorphic metric mapping (LDDMM) [25, 32], local neighbourhood mapping [33], neural networks [34–41] and linear combination of HRTFs [42]. Despite many efforts worldwide [43–46], the link between the morphology and HRTFs is not fully understood yet, mostly because of the high dimensionality of the problem. Most recent tools for studying that link are rooted in aligning high-resolution pinna representations to target representations facilitated with parametric pinna models [47, 48].

In the subjective personalisation, listeners are confronted with several sets of HRTFs and an algorithm (usually based on the evaluation of localisation errors, i.e., the difference between perceived and actual sound-source location) adapts the HRTF sets aiming at converging at listener-specific HRTFs [9, 49]. For an educated guess for the initial sets, anthropometric data can be used to pre-scale the HRTF sets, or the HRTF sets can be pre-selected via psychoacoustic models [50]. Clustering of the HRTF sets can further improve the relevance and reduce the duration of the personalisation procedure [49, 51].

All these methods aim at providing a specific quality in terms of acoustic and psychoacoustic properties. In the following section, we describe the acoustic properties and psychoacoustic requirements for human HRTFs, both of which lay the base for HRTF acquisition. Then, we briefly describe the most important requirements for the acoustic HRTF measurement, complementing the work of Li and Peissig [20]. Finally, we describe approaches for numeric HRTF calculation in greater detail.
