**3. Acoustic measurement**

The principle of an acoustic HRTF measurement relies on the system identification of the HRTF considered as a linear and time-invariant system. Here, an HRTF describes the propagation path between a microphone and a loudspeaker. Because of the binaural synchronisation, HRTFs are measured simultaneously at the two ears. The measurements are commonly performed for many source positions because of the required high spatial resolution. Recently, the details of the acoustic measurements, including a comprehensive list of HRTF measurement sites has been reviewed [20]. Thus, we only briefly introduce the basics and focus on the most recent advances in the acoustic HRTF measurement.

Typically, two omnidirectional microphones are placed in both ear canals, and the loudspeakers are arranged around the listener, ideally, with the number of loudspeakers corresponding to the number of HRTF positions to be measured. **Figure 6** shows two examples of measurement setups of various complexity: In **Figure 6a**, the listener is located on a turntable and moves within a fixed nearcomplete circular loudspeaker array. **Figure 6b** shows a similar approach with a near-complete spherical loudspeaker array, and **Figure 6c** shows the placement of a microphone in the ear canal so that it is membrane lines up with the entrance of the

#### **Figure 6.**

*(a) Example of an HRTF measurement setup with mechanical rotation required. Listener sits on a chair (mounted on a turntable) surrounded by a loudspeaker arc (22 loudspeakers ranging from 30 to 210° in 5° steps). A head-tracker mounted on the head of the listener tracks head movements, triggering the need for measurement repetition in case of too large movements. (b) Example of more recent HRTF setups. 91 loudspeakers are mounted in a near-complete spherical array reducing the total measurement duration. (c) Example for a microphone placement in an HRTF measurement. Note the closed ear canal and the headtracker sensor.*

### *Perspective Chapter: Modern Acquisition of Personalised Head-Related Transfer Functions… DOI: http://dx.doi.org/10.5772/intechopen.102908*

ear canal. Actually, it does not matter whether the microphones or loudspeakers are placed in the ear canal—this approach of 'reciprocity' is usually facilitated in numeric HRTF calculations (Section 4.4). However, setups with loudspeakers in the ears [102] lack signal-to-noise ratio (SNR) as the amplitude of the source signal needs to be low enough to not harm the listener, making the setup impractical for experiments. With the microphones in the ears, the most simple setups consist of a single loudspeaker moved around the listener [103]. Unfortunately, such setups lead to a long measurement duration for a dense set of HRTF positions. With the increasing availability of multichannel sound interfaces and adequate electroacoustic equipment, over the decades, the number of actually used loudspeakers increased. Setups with only a single loudspeaker moving around the listener have been replaced by setups with loudspeaker arcs surrounding the listener. In those setups, the listener sits on a turntable and either the listener (e.g., **Figure 6a**) or the loudspeaker arc is rotated [88, 104].

Recent approaches follow one of two different directions; On the one hand, generic and individual HRTFs are measured with a growing number of loudspeakers used in specialised facilities [67, 95]. Some even with such a large amount of loudspeakers that the listener is rotated for a few discrete positions, and postprocessing algorithms interpolate between HRTF directions, e.g., the setup in **Figure 6b**. On the other hand, user-friendly individual HRTF measurement approaches are suggested, showing a trend towards decreasing the complexity of the measurement setup and using widely available equipment. In these approaches, only a single speaker is used and the listener is asked to move the head until a dense setup of HRTF directions can be obtained. These measurements enable simple systems to be used at home [105, 106], in which a head-tracking system records the listener's head movements in real time and adapts the measured spatial HRTF grid. Head-above-torso orientations have to be considered additionally [100], but they reduce the complexity of the measurement setup and enable using widely available equipment, e.g., a commercially available VR headset and one arbitrary loudspeaker, in regular rooms, thus increasing the user-friendliness for setups [105].

Most of those recent approaches consider spatially discrete positions of the listener and/or the loudspeakers. In order to tackle the trade-off between high spatial resolution and long measurement duration, other recent advances have been made towards spatially continuous measurement approaches [107–109]. These approaches enable the measuring of all directions around the listener for a single elevation within less than 4 minutes [110]. Certainly, an advantage of such an approach is the access to the spatially continuous information, which is important especially for frontal HRTF directions. With more and more silent turntables and swivelled chairs, achieving a high SNR is not a big issue. Most recent approaches related to the spatially continuous measurement utilise Kalman filters to acquire system parameters representing HRTFs, and thus speed up the HRTF measurement in a multi-channel setup [111]. Compared to spatially discrete approaches, the spatially continuous method can achieve accuracy within a spectral error of 2 dB [109].

The requirements of the room are not rigorous: In principle, the measurement room does not have to be perfectly anechoic, but it has to fulfil some requirements regarding size and reverberation time. Room modes may exist below 500 Hz as they can be neglected in that frequency range [1]. Acceptable measurement results can be obtained as long as the first room reflection arises after 5 ms such that the measured room impulse responses can be truncated without truncating the HRIRs. Medium and large surfaces, i.e., the mount of the loudspeakers, the loudspeaker arc, the turntable, listener seat, etc., can potentially cause acoustical reflections overlapping with the direct sound path within the first 5 ms of the HRTF. These

reflections are usually damped, e.g., by covering the speakers in absorption material. Before the measurement, the listener's head has to be aligned in the measurement setup, adjusting the ears to the interaural axis of the system and the head to the Frankfurt plane. This alignment can be supported by, e.g., a laser system. The orientation and position of the listener's head should be monitored throughout the measurement procedure in order to detect listener's unwanted movements or position drifts. This helps when having to repeat potentially corrupted measurements.

The loudspeakers used for the measurements need to show a fast impulse response decay; fast enough to not interfere with the temporal characteristics of the HRTFs. This can be achieved by using loudspeaker drivers with light membranes, simple electric processing and no acoustic feedback such as a bass-reflex system. The acoustic short-circuit usually limits the lower frequency range of the loudspeakers, and multidriver systems are a common solution to that problem. In order to achieve a spatially compact acoustic source in a multidriver system, it is common to use coaxial loudspeaker drivers with an omnidirectional directivity pattern in HRTF measurement systems [112].

The placement of the microphones can also be an issue. Early setups used an open ear canal where the microphones were positioned close to the eardrum [11]. However, the effect of the ear canal does not seem to be direction-dependent, and its consideration in the measurement introduces technical difficulties and a large measurement variance [19, 113, 114]. Nowadays, the microphones are usually placed at the entrance of the ear canal which is acoustically blocked [11, 20]. Blocking the ear canal can be achieved by using microphones enclosed in earplugs made from foam or silicone or by wrapping the microphone in skin-friendly tape before inserting it. Note that such a measurement captures all directionaldependent features of the acoustic filtering by the outer ear, however, the directional-independent filtering by the ear canal is not captured. All cables from the microphone have to be flexible enough to minimise their effect on the acoustics within the pinna—one way is to lead the cable through the incisura intertragica and secure it with tape on the cheek, see **Figure 6c**.

In general, system identification can be performed with a variety of excitation signals. While previously Golay codes or other broadband signals have been used [115], more recently, the multiple exponential sweep method (MESM) [112] has been established and further improved [116], enabling fast HRTF measurement at high SNRs, reducing the discomfort for the listener. Still because of the imperfections in the electro-acoustic setup, a reference measurement is required to estimate the basis of the measurement without the effect of the listener, i.e., to estimate *p*0. It is typically done for each microphone by placing the microphone in the centre of the measurement setup and recording the loudspeaker-microphone impulse response for all loudspeakers. The reference measurement can also be used to control the sound pressure level in order to avoid clipping at the microphones and analogue-digital converters. This can especially happen when each loudspeaker is driven within its linear range, but the overlapping signals from multiple loudspeakers raise the total level to ranges beyond the linear range of the recording system. Additionally, because of the HRTF's resonances, the level during the actual HRTF measurements can be up to 20 dB higher than that during the reference measurement, translating to a requirement for the headroom of at least 30 dB at the reference measurement. The maximum level is not only limited by the equipment; the listener's hearing range also needs to be considered, i.e., the maximum sound pressure level must neither create discomfort for the listener, nor go beyond the levels of safe listening. There is no special requirement for the listener regarding their audible threshold, hearing range or the visual sense. However, a particular

#### *Perspective Chapter: Modern Acquisition of Personalised Head-Related Transfer Functions… DOI: http://dx.doi.org/10.5772/intechopen.102908*

measurement equipment or a particular lab could have some restrictions on, e.g., the listener's weight or height.

**Figure 7** shows measurement grids of three exemplary setups and one measurement grid of a simulation setup. **Figure 7a** and **b** correspond to the measurement setups in **Figure 6a** and **b**. In these setups, not every loudspeaker plays a stimulus at every position around the listener. An extreme case is a loudspeaker positioned at 90<sup>∘</sup> elevation that needs to be only measured once. **Figure 7c** shows another setup with uniformly distributed measurement points, and **Figure 7d** shows a uniform sampling grid used in numerical calculation experiments [95].

The repeatability of the measurement is an important issue. Within a single laboratory, changes in the room conditions such as temperature and humidity, as well as changes in the setup such as the ageing of the equipment may compromise the repeatability of the HRTF measurement [11, 20]. When comparing the HRTFs measurement across the labs, differences in the setups play also a role. In interlaboratory and inter-method HRTF measurement comparison obtained for the same artificial head, severe ITD variations of up to 200 *μ* have been found [63, 64].

Once the HRTFs have been measured for all source positions, post-processing needs to be done before the HRTFs are ready to be used. First, in order to account for acoustic artefacts caused by the measurement room, a frequency-dependent windowing function is usually applied truncating the HRIRs [100, 117, 118]. Second, the measured HRIRs are equalised by the impulse response obtained from the reference measurements, i.e., with the microphone placed at the centre of the coordinate system with the listener absent. This equalisation can be either free-field or diffuse-field. For the free-field equalisation, the reference measurement is required only for the frontal direction (0° azimuth, 0° elevation) [54], whereas for the diffuse-field equalisation, the reference measurement is the root mean square (RMS) impulse response of all directions [75], and the results are commonly denoted as directional transfer functions (DFT) [119]. Third, in most common rooms and even in (semi)anechoic rooms, reflections (or room modes) cause artefacts below 400 Hz, confounding the free-field property of HRTFs. Additionally, most loudspeakers used in the measurement are not able to reproduce low frequencies with sufficient power. Since the listener's anthropometry has a small effect on HRTFs in the low-frequency range, HRTFs can be extrapolated towards lower frequencies with a constant magnitude and linear phase [20, 117]. Further postprocessing steps may include spectral smoothing to account for listener position

#### **Figure 7.**

*Four examples of spatial HRTF grid resolutions. (a) Almost spherical loudspeaker arc with moving listener, see also Figure 6a; the loudspeaker arc consists of 22 loudspeakers and yields 1550 directional measurements. Notice the higher resolution in the front of the listener as opposed to the lower resolution in the lateral regions. (b) Sparse measurement grid in a nearly spherical 91-loudspeaker array, see also Figure 6b, yielding 451 directional measurements with 5 different listener positions. (c) Measurement grid with 440 uniformly distributed points, measured with a loudspeaker arc with 37 loudspeakers from the HUTUBS database [95]. (d) Sampling grid for numerical calculations with 1730 directions. (c) and (d) are taken from the HUTUBS database [95].*

#### *Advances in Fundamental and Applied Research on Spatial Audio*

inaccuracies [60, 120] or adding a fractional delay to account for temperature changes followed by onset changes of the time signals [100].

The availability of acoustical HRTF measurements was a big step towards personalised binaural audio and virtual reality experience. However, even a fast or continuous measurement method requires the listener to sit still for a few minutes [104, 110, 112] in a specialised lab facility. Recent advances have been made towards both large-scale high-resolution and small-scale at-home easy-to-use solutions, providing HRTF acquisition to a large audience. Still, the imperfections in the electro-acoustic equipment set drawbacks of the acoustic measurement. Here, recent advances in the numeric calculations of the HRTFs can provide an interesting alternative.

## **4. Numerical calculation of HRTFs**

Generally, the calculation of HRTFs simulates the effects of the pinna, head and torso on the sound field at the eardrum. The goal is to numerically obtain the sound pressure at the two ears for a given set of frequencies and spatial positions. There are many methods to simulate wave propagation [121]. When applied to the HRTF calculation, all of the methods require a geometric representation of head and pinnae as input. For an accurate set of HRTFs, an exact 3D representation of the geometry, especially that of the pinnae with all their crests and folds, is of utmost importance [90]. The 3D geometry is represented using a discrete and finite set of elements, further denoted as 'mesh'. A mesh is a representation of the region of interest (ROI), i.e., the object's volume and surface, with the help of simple geometric elements. In most applications, the faces of these elements are assumed to be flat, which in turn explains the preference for triangular faces because they are always flat and therefore have one unique normal vector. This is not always the case for other shapes, e.g., quadrilaterals.

The requirements on the mesh have to consider geometrical as well as acoustical aspects. From the acoustic perspective, a typical rule of thumb for numerical calculation requires the average edge length (AEL) of elements to be at least a sixth of the smallest wavelength [122], which corresponds to an AEL of 3.5 mm for frequencies up to 16 kHz. However, in order to describe the pinna geometry sufficiently accurate, the average edge length (AEL) of the elements in the mesh needs to be around 1 mm, independently of the calculation method [90]. Some numerical calculation algorithms are, in general, more efficient and stable if the geometries are represented locally with elements of similar sizes and as regular as possible, e.g., almost equilateral triangles. To this end, the mesh may undergo a so-called *remeshing* [123], which inserts additional elements and resizes all elements to a similar size. **Figure 8** shows the same pinna in all panels, represented by meshes with increasing AELs from left to right.

#### **Figure 8.**

*Pinna meshes represented by various AELs [90]. From left to right: AEL of 1 mm, 2 mm, 3 mm, 4 mm and 5 mm. Note how the representations of the helix and fossa triangularis degrade with increasing AEL.*

*Perspective Chapter: Modern Acquisition of Personalised Head-Related Transfer Functions… DOI: http://dx.doi.org/10.5772/intechopen.102908*

Interestingly, only the pinna regions contributing to the HRTF (compare **Figure 4b**) require to be accurately represented [56] and the remainder of the geometry can be more roughly modelled. This applies especially to the head, torso and neck, which can be represented by larger elements. These anatomical parts can additionally be approximated by simple geometric shapes, e.g., a sphere for the head, a cylinder for the neck and a rectangular cuboid or an ellipsoid representing the torso [65], see e.g., **Figure 4a**. To emphasise the sophisticated direction dependency of the pinna, **Figure 9** shows the calculated sound pressure distribution over the surface of the pinna. This simulation is calculated by defining one element in the centre of the ear canal as a sound source and evaluating the resulting sound pressure field at the vertices of the rest of the geometry; the procedure is explained thoroughly in Section 4.4.

The geometry can be captured via numerous approaches [124]: a laser scan [125], medical imaging techniques such as magnetic resonance imaging (MRI) [69, 126] and computer tomography (CT) [90], or photogrammetric reconstruction [127]. Laser, MRI and CT scans yield high-resolution meshes offering a small geometric error, but in turn, they need a special equipment. The laser scans are based on line-of-sight propagation and are able to measure short distances with an accuracy of up to 0.01 mm. The downside of line-of-sight propagation is that the manifolds of the pinnae are not easy to capture. In the medical imaging approaches, different downsides arise; acquiring the pinnae geometry via MRI is not a trivial process because they are flattened by the head support. This leads to two separate MRI measurements of each ear. The anatomy is then captured in 'slices' that can be stitched together in the postprocessing rather easily. The CT captures the anatomy in a similar way, but due to the high radiation exposure, such scans are usually not done with human subjects but with (silicone) mouldings of the listener's ear. The overall procedure may take more time than an acoustic HRTF measurement and require the listener to either manufacture a moulding or meeting rather specific criteria for the scanning equipment (e.g., no tattoos, piercings, or implants). As an alternative, recent advances have been made for more widely applicable approaches such as photogrammetry [23, 128]. Photogrammetry is not only non-invasive but also can be done with widely available equipment, e.g., a smartphone or digital

#### **Figure 9.**

*Magnitude of the sound pressure calculated for each element of the surface for a 13-kHz sound source placed in the ear canal. Note the high dynamic range containing peaks (red) and notches (blue) in the distribution pattern in the area of the pinna.*

camera, without having the listener to travel to a specialised facility. In a nutshell, the photogrammetrical approach works as follows: a set of photographs from different directions is made for each ear [127, 129], the so-called *structure from motion* [130] approach estimates the camera positions by analysing the mutual features across the photographs; a 3D point cloud is constructed; and a 3D mesh is created by connecting the points in the cloud. Note, that currently, manual corrections (e.g., smoothing to reduce noise, filling holes) are still required to reach the high quality of the meshes required for accurate HRTF calculations.

Simulations of acoustics require the information about the acoustic properties of the simulated objects. The HRTFs can be simulated with the 3D geometry represented as fully reflective, i.e., all surfaces having infinite acoustic impedance. With respect to localisation performance, only a small *perceptual* difference was found between acoustically measured and HRTFs calculated for acoustically reflective surfaces [101]. However, the impedance of various regions such as skin and hair may influence the direction-independent HRTF properties and cause changes in the perceived timbre [95, 99, 100].

In order to calculate HRTFs with sufficient spectral accuracy, the number of elements needs to be in the range of several tens of thousands, which might be important for the requirements of the computational power. Such large numerical problems usually require large amount of memory being in the range of Gigabytes. Nevertheless, the calculation time may reach a few days, especially when calculating HRTFs for many frequencies with high-resolution meshes. Note that if the used algorithm calculates HRTFs for each frequency independently, the calculations can be performed in parallel, and computer clusters can be used. This reduces the calculation time to a few hours for HRTFs the full hearing range and a mesh of several tens of thousands of elements.

All the algorithms for numerical HRTF calculation are based on the propagation of sound waves in the free field around a scattering object (also "scatterer"), usually described by the Helmholtz equation

$$
\nabla^2 p(\mathbf{x}) + k^2 p(\mathbf{x}) = q(\mathbf{x}), \mathbf{x} \in \Omega\_\epsilon,\tag{2}
$$

where <sup>∇</sup><sup>2</sup> <sup>¼</sup> *<sup>∂</sup>*<sup>2</sup> *<sup>∂</sup>x*<sup>2</sup> <sup>þ</sup> *<sup>∂</sup>*<sup>2</sup> *<sup>∂</sup>y*<sup>2</sup> <sup>þ</sup> *<sup>∂</sup>*<sup>2</sup> *<sup>∂</sup>z*<sup>2</sup> denotes the Laplace operator in 3D, *p*ð Þ **x** denotes the (complex valued) sound pressure at a point **x** for a given wavenumber *k* in the domain Ω*<sup>e</sup>* around the scattering object and *q*ð Þ **x** denotes the (complex-valued) contribution of an external sound source at the acoustic field around the object. The wavenumber *k* is <sup>2</sup>*π<sup>f</sup> <sup>c</sup>* with *f* being the frequency and *c* the speed of sound.

In order to solve the Helmholtz equation for a given scatterer, boundary conditions are necessary. The *Neumann* boundary condition assumes the object to be acoustically hard, and the (scaled) particle velocity at the boundary can be set to zero,

$$\frac{\partial p(\mathbf{x})}{\partial \mathbf{n}} = \nabla p(\mathbf{x}) \cdot \mathbf{n} = \mathbf{0}, \qquad \mathbf{x} \in \Gamma,$$

where **n** denotes the normal vector at the surface pointing away from the object. For the boundary condition at infinite distance, the so-called *Sommerfeld* radiation condition can be applied,

$$\left\langle \frac{\mathbf{x}}{||\mathbf{x}||}, \nabla p \right\rangle - ikp(\mathbf{x}) = o\left(\frac{1}{||\mathbf{x}||}\right), \qquad ||\mathbf{x}|| \to \infty,$$

with *o*ð Þ*:* showing that the right side grows much faster than the left side. This ensures that the sound field decays away from the object [131].

*Perspective Chapter: Modern Acquisition of Personalised Head-Related Transfer Functions… DOI: http://dx.doi.org/10.5772/intechopen.102908*

For the calculation of HRTFs, the Helmholtz equation can be solved numerically by means of various approaches, which are based on a discretisation of the exterior domain Ω*<sup>e</sup>* around the scatter Γ. Some of these methods solve the Helmholtz equation in the frequency domain, and others solve its counterpart, the wave equation, in the time domain. In general, the formulations and the results in the different domains can be transformed into each other by using, e.g., the Fourier transformation. In the following, we describe the most prominent methods used for HRTF calculations.

#### **4.1 The finite-element method**

The finite-element method (FEM) solves the Helmholtz equation, Eq. (2), considering the scattering object or the domain around it as a volume [132]. **Figure 10** shows an example of a finite (domain) volume Ω*<sup>e</sup>* considered in the calculations with the scatterer with surface Γ placed inside that volume. To simulate the acoustic field around that object, the weak form of the Helmholtz equation is used, i.e., the equation is multiplied by a set of known test functions *w*ð Þ **x** and integrated over the whole domain, thus transforming the partial differential equation (e.g., the Helmholtz equation) into an integral equation, that can be easier solved numerically:

$$\int\_{\tilde{\Omega}} (\nabla^2 p(\mathbf{x}) + k^2 p(\mathbf{x})) w(\mathbf{x}) d\mathbf{x} = \int\_{\tilde{\Omega}\_\epsilon} q(\mathbf{x}) w(\mathbf{x}) d\mathbf{x}.\tag{3}$$

Secondly, the unknown pressure *p*ð Þ **x** is approximated by a linear combination

$$p(\mathbf{x}) = \sum\_{j=1}^{N} p\_j \phi\_j(\mathbf{x}) \tag{4}$$

**Figure 10.** *2D representation of meshes used in FEM. The elements are uniformly distributed and fitted to the boundary of the domain* Ω*e.*

of so-called ansatz functions *ϕ<sup>j</sup>* ð Þ **x** , *j* ¼ 1, … , *N*. These ansatz functions, or element shape functions, help at interpolating between the discrete solutions for each point of the mesh. They are, in general, simple (real) polynomials defined locally on the elements of the mesh, e.g., having the value of 1 at their own coordinates and zero for other points of the mesh. Recent advances have been made towards adaptively finding higher-order polynomials and thus drastically reducing the computational effort [133, 134]. In theory, Eq. (3) should be fulfilled for all possible test functions *w*ð Þ **x** , in practice, however, often the ansatz functions are also used as test functions, i.e., *wi*ð Þ¼ **x** *ϕi*ð Þ **x** . With this choice, Eq. (3) can be transformed into a linear system of equations

$$\mathbf{Kp} = \mathbf{g},\tag{5}$$

where

$$\mathbf{K}\_{\mathbf{ij}} = \int\_{\Omega\_{\epsilon}} \frac{d\phi\_i}{d\mathbf{x}} \frac{d\phi\_j}{d\mathbf{x}} - k^2 \phi\_i \phi\_j d\mathbf{x}, \qquad \mathbf{g}\_i = \int\_{\Omega\_{\epsilon}} q(\mathbf{x}) \phi\_i(\mathbf{x}) d\mathbf{x},$$

and **p** is the vector containing the unknown coefficients *pi* of the representation Eq. (4).

In general, the unknown coefficients *pi* represent the complex sound pressure at a given node **x***<sup>i</sup>* of the mesh. The integrals involved are solved using numerical methods [135].

When calculating HRTFs, the space around the scatterer is assumed to be continuous and infinite; in practice, this space has to be discretised and truncated to a finite domain by inserting a virtual boundary. When applied to the calculation of HRTFs, a virtual boundary of the (now finite) domain Ω*<sup>e</sup>* needs to be defined and conditions have to be set to avoid any reflections from this boundary, thus keeping in line with anechoic or free-field conditions. There are several methods to do so, with the socalled perfectly matched layers (PMLs) being the most popular among the reviewed HRTF calculation approaches. The PML is created by inserting an artificial boundary inside Ω*e*, e.g., a sphere with sufficiently large radius, and artificially damped equations are used to represent a solution that can then be numerically calculated, fulfilling the Sommerfeld radiation condition. Recent advances have been made to define PMLs automatically by extruding the boundary layer of the mesh and obtaining the geometric parameters during the extrusion step [136].

The FEM has been widely used in HRTF calculations [137–141] and yields similar results to acoustical HRTF measurements with spectral magnitude errors of approximately 1 dB [137, 141]. The downside, however, is the need to model 3D volumes around the head, resulting in models of a high number of elements, having a strong impact on the calculation duration.

#### **4.2 The finite-difference time-domain method**

A similar approach as the FEM can also be followed in the time domain. By using a short sound burst in the time domain as an input signal, the HRTFs within a wide frequency range can be calculated at once. This approach is called the finitedifference time-domain (FDTD) method [142] and can be derived by solving the wave equation in the time domain

$$
\nabla^2 \bar{p}(\mathbf{x}, t) - \frac{\partial^2 \bar{p}(\mathbf{x}, t)}{\partial t^2} = \bar{q}(\mathbf{x}, t),
$$

*Perspective Chapter: Modern Acquisition of Personalised Head-Related Transfer Functions… DOI: http://dx.doi.org/10.5772/intechopen.102908*

where *p ^* and *q ^* denote sound pressure fields in the time domain. The PML is applied to create the boundary conditions for outgoing sound waves. The evaluation grid is sampled evenly in cells across the domain with grid spacing *h*, considering discrete time steps *m*. A key parameter for numerical stability of the FDTD is the Courant number

$$
\beta = \frac{cm}{h},
$$

defining the number of cells the sound propagates per time step. Typically, in order to obtain stable HRTF calculations, the Courant number is *<sup>β</sup>* <sup>¼</sup> <sup>1</sup>*<sup>=</sup>* ffiffiffi <sup>3</sup> <sup>p</sup> .

**Figure 11** shows a 2D representation of a mesh used in the FDTD method. Note that because the mesh needs to consist of evenly spaced elements, most of the objects cannot be represented accurately and a sampling error is introduced at the boundary surface Γ of the object. Additionally, as derivatives of functions are approximated by finite differences, the arithmetic operations are valid for infinite resolution, but when calculating on physical computers, the precision depends on the number format used and the gridsize, introducing errors in the results [143]. Refining the grid is a potential solution to the sampling error for staircase approximations [144, 145], and when framing this problem to HRTF calculations, spectral magnitude errors of 1 dB up to 8 kHz and 2 dB up to 18 kHz can be achieved [146, 147], suggesting only negligible increase in localisation errors when listening to HRTFs calculated by the FDTD method.

Because of the additional sampling errors for irregular domains, recent advances have been made towards using quasi-cartesian grids [148], dynamically choosing grid resolutions [149], or towards the finite-volume method (FVTD), which is based on energy conservation and dissipation of the system as a whole and uses the integral formulation of the FDTD [150]. One solution approach there is to adaptively sample the grid at the boundary and introduce unstructured or fitted cells [151, 152]. A thorough comparison between FEM, FDTD and FVTD methods is available in [153].

In fact, the FDTD method has been widely applied to HRTF calculations [145, 146, 154, 155], and it certainly offers the advantage of calculating broadband

#### **Figure 11.**

*2D representation of meshes used in the FDTD method. Note that in this representation, the object surface* Γ *does not line up exactly with the sampling grid.*

HRTFs while not introducing additional computational cost when multiple inputs or outputs are used. However, because of the complex geometry of the pinnae, a submillimetre sampling grid is required, resulting in the need for a delicate preprocessing.

### **4.3 The boundary-element method**

The boundary element method (BEM) is based on a special set of test functions in the weak formulation of the Helmholtz equation Eq. (3), namely the Green's function

$$G(\mathbf{x}, \mathbf{y}) = \frac{e^{ikr}}{4\pi r},$$

where *r* ¼ k**x** � **y**k, and **x** and **y** are two points in space. By using this function, it is possible to reduce the weak form of the Helmholtz equation to an integral equation, i.e., the boundary integral equation (BIE), that only involves integrals over the surface Γ of the object, and *not* the volume Ω*e*. Assuming that the external sound source as a point source at **x**<sup>∗</sup> , and an acoustically reflecting (= sound hard, ∇*p* **y** � � � **<sup>n</sup>** <sup>¼</sup> 0, **<sup>y</sup>**∈Γ) surface, the sound field at a point **<sup>x</sup>** on a smooth part of that surface Γ is given by:

$$\frac{1}{2}p(\mathbf{x}) - \int\_{\tilde{\Gamma}} H(\mathbf{x}, \mathbf{y})p(\mathbf{y})d\mathbf{y} = G(\mathbf{x}^\*, \mathbf{x})p\_0,\tag{6}$$

where *<sup>H</sup>* **<sup>x</sup>**, **<sup>y</sup>** � � <sup>¼</sup> *<sup>∂</sup><sup>G</sup> <sup>∂</sup>***n***<sup>y</sup>* **<sup>x</sup>**, **<sup>y</sup>** � � is obtained by the derivative of the Green's function at a point **y** in the direction of vector **n** normal to the surface at this position, and *p*<sup>0</sup> denotes the strength of the sound source.

In comparison with the other two methods, the BEM has the advantage that only the *surface* of the object such as the head and the pinnae needs to be discretised, whereas in FEM and the FDTD method also a discretisation of the *volume* surrounding the head has to be considered, see **Figures 10**–**12**. Thus, in the boundary element method, all calculations can be reduced to a manifold described in 2D, in our case, the domain of interest is reduced to the surface of the head. A second advantage of the BEM is that by using the Green's function, the Sommerfeld radiation condition is automatically fulfilled. Additionally, no domain boundary has to be

#### **Figure 12.**

*2D representation of a BEM mesh. Note that only the boundary of the surface* Γ *is considered and the domain volume* Ω*<sup>e</sup> does not have to be sampled.*

*Perspective Chapter: Modern Acquisition of Personalised Head-Related Transfer Functions… DOI: http://dx.doi.org/10.5772/intechopen.102908*

introduced, such as the PML. This renders the BEM an attractive method for calculating sound propagation in infinite domains, i.e., in free-field, as is the assumption when calculating HRTFs [156].

In order to solve a BEM problem, the BIE is discretized and solved by using methods such as the Galerkin, collocation or Nyström [157–159], all with the common goal of yielding a linear system of equations.

For the Galerkin method, the unknown pressure is approximated by a linear combination of ansatz functions as in Eq. (4). The BIE is again multiplied with a set of test functions (similar to the test functions *ϕi*ð Þ **x** used in FEM) and integrated with respect to **x** yielding a linear system of equations as in Eq. (5), where

$$\mathbf{K}\_{\mathbf{ij}} = \int\_{\Gamma} \frac{\phi\_i(\mathbf{x})\phi\_j(\mathbf{x})}{2} d\mathbf{x} - \iint\_{\Gamma} \phi\_i(\mathbf{x}) H(\mathbf{x}, \mathbf{y}) \phi\_j(\mathbf{y}) d\mathbf{y} d\mathbf{x},$$

and

$$\mathbf{g}\_i = \underset{\stackrel{\Gamma}{\leftarrow}}{\int} G(\mathbf{x}\_0 - \mathbf{x}) \phi\_i(\mathbf{x}) d\mathbf{x}.$$

Another commonly used approach especially used in engineering is collocation with constant elements, i.e., the sound field is assumed to be constant on each element of the mesh, and the BIE is solved at the midpoints **x***<sup>i</sup>* of each element (the set of all **x***<sup>i</sup>* are called collocation nodes) yielding a linear system of equations as in Eq. (5), where

$$\mathbf{K}\_{\mathbf{ij}} = \frac{1}{2}\delta\_{\mathbf{ij}} - \int\_{\Gamma\_j} H(\mathbf{x}\_i, \mathbf{y})d\Gamma, \qquad \mathbf{g}\_i = G(\mathbf{x}^\*, \mathbf{x}\_i)p\_0.$$

**<sup>p</sup>***<sup>i</sup>* <sup>¼</sup> *<sup>p</sup>*ð Þ **<sup>x</sup>***<sup>i</sup>* and with **<sup>x</sup>**<sup>∗</sup> being the position of the sound source outside the head. The integrals over each element are numerically solved using appropriate quadrature formulas (weighted sum of function values) and

$$
\delta\_{\vec{\eta}} = \begin{cases}
\mathbf{1} & \text{for } i = j \\
\mathbf{0} & \text{for } i \neq j
\end{cases}
$$

The BIE is solved for a given set of frequencies and the solutions **p***<sup>i</sup>* at the collocation nodes are then used to derive the HRTFs. Note that the collocation method can be interpreted as the Galerkin method utilising the delta functionals *δ*ð Þ **x***<sup>i</sup>* � **x** as test functions. A thorough comparison between Galerkin and collocation approaches can be found in [160].

The discretisation of just the surface introduces additional challenges. First, the Green's function becomes singular at the boundary where k**x** � **y**k ¼ 0 and special quadrature formulas need to be used close to these singularities [161, 162]; and second, the system matrix **K**, although small, is usually densely populated, which poses a challenge for computer memory and the efficiency of the linear solver used. When considering HRTF calculations for frequencies up to 20 kHz, high-resolution meshes are required and the corresponding linear systems may contain up to 100,000 unknowns.

In order to efficiently deal with such large systems, the BEM can be coupled with methods speeding up matrix–vector multiplications, such as the fast-multipole method (FMM) [163] or H-matrices [164] (so-called 'hierarchical' matrices). These methods have enabled an efficient and feasible calculation of HRTFs [165]. In a

nutshell, these methods aim at providing a method for an efficient and fast matrix– vector multiplication and are based on two steps. First, the elements of the mesh are grouped into clusters of approximately the same size with cluster midpoints **z***i*. Second, for two clusters C<sup>1</sup> and C2, that are sufficiently apart from each other, a separable approximation of the Green's function

$$G(\mathbf{x}, \mathbf{y}) \approx G\_1(\mathbf{x} - \mathbf{z}\_1) M(\mathbf{z}\_1, \mathbf{z}\_2) G\_2(\mathbf{y} - \mathbf{z}\_2)$$

is found. This approximation has two advantages: the local expansions *G*<sup>1</sup> and *G*<sup>2</sup> have to be made only once for each cluster, and the interaction between the elements of different clusters is reduced to a single interaction of the cluster midpoints. The resulting linear system of equations is then solved using an iterative equation solver [166].

Although the Helmholtz equation for external problems has a unique solution at all frequencies, the BIE has uniqueness problems at certain critical frequencies [159, 167]. Thus, to avoid numerical problems, the BEM needs to be stabilised at these frequencies, e.g., by using the Burton-Miller method [167]. BEM has been widely used to calculate HRTFs [165, 168–171] analysing the process from various perspectives. When applied to an accurate and high-resolution representation of the pinna geometry, BEM can yield similar results to the acoustic HRTF measurements by means of sound localisation performance [101, 172].

#### **4.4 Reciprocity**

In principle, in order to calculate an HRTF set, the Helmholtz equation needs to be solved for every source position **x**<sup>∗</sup> *<sup>j</sup>* separately, leading in up to several thousand right-hand sides in Eq. (5). Solving that many equations cannot be done quickly even with the help of iterative solvers. On the other hand, the HRTF calculation for the second ear is quite simple because the solution obtained from the solver is already available for every element of the surface, i.e., at the element representing the ear canal of the second ear. The approach of reciprocity can help to significantly speed up the calculations by swapping the role of the many source positions with the two elements representing the ear canals, requiring us to solve Eq. (5) only twice, i.e., once for each of the ears.

Helmholtz' reciprocity theorem states that switching source and receiver positions do not affect the observed sound pressure. When applied to HRTF calculations, virtual loudspeakers are placed in the entrance of the ear canal (replacing the virtual microphones) and the many simulated sound sources are represented by many virtual microphones (replacing the many virtual loudspeakers around the listener). By doing so, the computationally expensive part of the BEM, i.e., solving a linear system of equations to calculate the sound pressure at the surface, needs to be done only twice, namely once for each ear. Subsequently, the sound pressure at positions around the head can be calculated fairly easy and efficiently.

In more detail, assume that a point source with strength *p*<sup>0</sup> at the position **x**<sup>∗</sup> *j* causes a mean sound pressure of *p* over a small domain with area *A*<sup>0</sup> at the entrance of the ear canal. If the domain is sufficiently small, the mean sound pressure is an accurate representation of the actual sound pressure in this domain. By applying the reciprocity, we introduce a reciprocal sound source at the entrance of the ear canal which introduces a velocity *v*<sup>0</sup> and then calculate the sound pressure *p* **x**<sup>∗</sup> *j* at the actual sound-source position around the listener. The pressures *p* **x**<sup>∗</sup> *j* and *<sup>p</sup>* are linked by

*Perspective Chapter: Modern Acquisition of Personalised Head-Related Transfer Functions… DOI: http://dx.doi.org/10.5772/intechopen.102908*

$$
\overline{p} = \frac{p\_0 p\left(\mathbf{x}\_j^\*\right)}{A\_0 v\_0}.
$$

The reciprocal sound source can be modelled by vibrating elements Γvib, i.e., elements with an additional velocity boundary condition

$$v\left(\mathbf{y}\right) = \frac{1}{i\alpha\rho} \frac{\partial p}{\partial \mathbf{n}} = v\_0, \qquad \mathbf{y} \in \Gamma\_{\text{vib}}$$

where *ω* ¼ 2*π f* <sup>ℓ</sup>, and *ρ* is the density of air. Note that *v*<sup>0</sup> can be an arbitrary positive number because when calculating HRTFs [see for example Eq. (1)], the pressure is normalised by the reference pressure *p*0, thus cancelling *v*0. With this additional boundary condition, first, BEM can be used to calculate the sound field at the surface Γ, and then, Green's representation is applied to calculate the pressure at all positions of the actual sound sources **x**<sup>∗</sup> *j* ,

$$p\left(\mathbf{x}\_{j}^{\*}\right) - \int\_{\stackrel{\circ}{\Gamma}} \left(H\left(\mathbf{x}\_{j}^{\*},\mathbf{y}\right)p\left(\mathbf{y}\right)d\mathbf{y} + i\alpha\rho\underset{\stackrel{\circ}{\Gamma}}{\operatorname{G}}G\left(\mathbf{x}\_{j}^{\*},\mathbf{y}\right)\nu\left(\mathbf{y}\right)d\mathbf{y} = \mathbf{0}.\right)$$

Note that this equation is calculated after a discretisation, and because *p* **y** � � at the surface is known from the BEM solution, the calculation of the sound pressure around the head *p* **x**<sup>∗</sup> *j* � � is a simple matrix multiplication.

Reciprocity, combined with FMM-coupled BEM has been applied to calculate HRTFs, enabling calculations for a large spatial HRTF set within a few hours even on a standard desktop computer [172].

### **5. Other issues related to HRTF acquisition**

Over decades, HRTFs have been collected and stored in databases. Such databases are important for educational aspects, training of neural network algorithms [34, 37] and further research [23, 25–28, 173]. While in the early HRTF research days, HRTFs have been stored by each lab in a different format, since 2015, the spatially oriented format for acoustics (SOFA) is available to store HRTFs in a flexible but well-described way facilitating an easy exchange between the labs and applications. SOFA is a standard of the Audio Engineering Society under the name AES69. SOFA provides a uniform description of spatially oriented acoustic data such as HRTFs, spatial room impulse responses, and directivities [15].

When it comes to anthropometric data, unfortunately, there is currently no common format to specify and exchange anthropometric data. This is partially because currently, it is not known, which data are important. Some laboratories use the CIPIC parameters [88], some have extended them [174], and others have created whole new sets of parameters [128, 175]. An overview of currently used anthropometric parameters can be found in [176]. The development of parametric pinna models may shed light on the relevance of parameters needed to be stored in the future. The listener's geometry can also be stored in non-parametric representations such as meshes and point clouds of listener's ears and head. To this end, typical 3D dataset formats are used, e.g., OBJ, PLY or STL. These formats are widely used in computer graphics and thus easily accessible by many corresponding applications. A large collection of HRTF databases stored in SOFA, with some of them

combined with meshes stored in OBJ, PLY and STL files is available at the SOFA website.<sup>1</sup>

When HRTFs are obtained, there is strong demand to evaluate their quality. This is especially interesting when comparing the results from numerical HRTF calculations. The evaluations can be performed at various levels: geometrical, acoustical and perceptive. The evaluation at the geometric level can be done by comparing the deviation between two meshes of the pinna and representing the deviation as the Hausdorff distance [177]. The evaluation at the acoustic level can be done by calculating the spectral distortion

$$SD = \sqrt{\frac{1}{N} \sum\_{i=1}^{N} \left( 20 \log \frac{\widehat{Harg} \text{F} \left( \mathbf{x}^\*, f\_i \right)}{HRT \text{F} \left( \mathbf{x}^\*, f\_i \right)} \right)^2},\tag{7}$$

where *HRTF* <sup>d</sup> **<sup>x</sup>**<sup>∗</sup> , *fi* � � denotes the calculated and *HRTF* **<sup>x</sup>**<sup>∗</sup> , *fi* � � the measured one, summarised over *N* discrete frequencies. The evaluation at the perceptual level can be simulated by means of auditory modelling [50] or direct performance of localisation experiment [50, 90, 147]. Especially the evaluation of localisation errors in the median plane can be relevant because the sound localisation in the median plane is directly related to the quality of monaural spectral features in the HRTFs [46, 178]. A calculated HRTF set yielding similar perceptual results as the natural listener's HRTFs can be described as *perceptually valid*.
