**1. Introduction**

Recent developments in the field of virtual and augmented reality have increased the demand for high fidelity binaural reproduction technology [1]. Such technology aims to reproduce the spatial sound scene at the listener's ears through a pair of headphones, providing an immersive virtual sound experience [2]. The two main acoustic processes producing the binaural signals are the spatial sound-field result of the propagation from the sound source to the listener, and the interaction of this sound-field with the listener's body, which is described by the Head-Related Transfer Function (HRTF)<sup>1</sup> [3]. Binaural signals can be obtained directly using binaural microphones at the listener's ears [4]. In this way, the sound-field and the

<sup>1</sup> The term" HRTF" is used in this chapter to refer to the set of transfer functions for a set of source positions, unless stated otherwise.

HRTF are jointly captured and the reproduced binaural signals are limited to the recording scenario. More flexible reproduction, enabling, for example, the use of individual (personalized) HRTFs and head-tracking, can be obtained by rendering the binaural signals in post-processing. This requires the sound-field and the HRTF to be available separately. The HRTF could be obtained from an online database, or it could be measured acoustically or simulated numerically for an individual listener [5]. The sound field could also be simulated numerically, or captured using a microphone array [6–8].

In the past, the rendering of binaural signals using Ambisonics representation of the sound-field has been proposed [9–11]. The Ambisonics signals are the Spherical-Harmonics (SH) domain coefficients of the plane-wave amplitude density function, which encode the directional information of the sound-field. The binaural signals are computed by summing the products of the Ambisonics signals and the SH representation of the free-field HRTF. This offers the flexibility to manipulate either the sound field or the HRTF or both by employing algorithms that operate in the SH domain [12, 13].

The Ambisonics signals of a measured sound-field can be obtained from spherical microphone array recordings [14]. In practice, these arrays have a limited number of microphones, which limits the usable SH order [15]. A similar order limitation may also apply for a simulated sound-field due to computational efficiency considerations or memory usage [1, 16]. This order limitation places a constraint on the maximum SH order of the employed HRTF, which leads to truncation error [17]. Truncation error results in significant artifacts, both in frequency and in space, which have a detrimental effect on the perception of the reproduced binaural signals, for example, on the localization, source width, coloration and stability of the virtual sound source [18–21]. One way to overcome the limitations of low order Ambisonics is by a parametric representation of the sound field. For example, using DirAC [22], COMPASS [23], SPARTA [24] or HARPEX [25]. However, these approaches may introduce errors due to incomplete parameterization and thus do not provide ideal solution.

The HRTF truncation error can be reduced by pre-processing that lowers its effective SH order [26]. Evans et al. [27] suggested aligning the HRTF in the time domain prior to deriving its SH decomposition, and showed that this reduces the effective SH order significantly. They also showed that representing separately the magnitude and the unwrapped phase of the HRTF results in a lower SH order for both, compared to the complex-frequency representation. Romigh *et al.* [28] suggested using minimum-phase representation of the HRTF, together with logarithmic representation of the magnitude, and showed that a SH order as low as 4 is sufficient in order to achieve localization performance that is comparable with that of real sound sources in free-field. Brinkmann and Weinzierl [26] compared between these methods (among others), and concluded that the time-alignment method requires the lowest SH order in terms of SH energy distribution and Just Noticeable Difference (JND) in binaural models for source localization, coloration and correlation. Recently, a new method for efficient SH representation of HRTFs, which is based on ear-alignment, was presented [29]. This method proved to be more robust than the time-alignment method, while achieving a similar reduction in the effective SH order.

The order reduction of the HRTF using all the above methods is based on manipulating its phase component. However, the use of such a pre-processed HRTF for binaural reproduction using Ambisonics signals is not trivial due to the relation between the phases of the HRTF and the sound-field; hence, alternative solutions have also been explored. In [30], Zaunschirm *et al.* presented a method that uses a pre-processed HRTF, obtained by means of frequency-dependent time-alignment,

*Binaural Reproduction Based on Bilateral Ambisonics DOI: http://dx.doi.org/10.5772/intechopen.100402*

to reproduce binaural signals in the SH domain using constrained optimization. They suggested pre-processing of the HRTF by removing its linear-phase component at high frequencies. Schörkhuber *et al.* further developed this approach in [31], where they presented the Magnitude Least-Squares (MagLS) method that performs magnitude-only optimization at high frequencies. Although the linear-phase component at high frequencies may be less important for lateral localization [32, 33], its removal still introduces errors in the binaural signal, and may affect other perceptual attributes [34, 35]. In [36], Lübeck *et al.* showed that the MagLS method achieved similar perceptual improvement to previously suggested diffuse field equalization methods for binaural reproduction [19, 37]. In [38], Jot *et al.* presented the Binaural B-Format approach, which uses first order Ambisonics signals at the location of the listener's ears and a minimum-phase approximation of the HRTF to compute the binaural signals directly at the listener's ears. This approach was further studied in [39, 40], along with several other approaches also based on the linear decomposition of the HRTF over spatial functions. Recently, the Binaural B-Format was extended to an arbitrary SH order using Bilateral Ambisonics reproduction [41, 42], which uses the ear-aligned HRTF and preserves the HRTF phase information. This method significantly reduces the truncation error and was shown to outperform current stateof-the-art methods using MagLS with low SH order reproduction. However, using Bilateral Ambisonics imposes challenges on acquiring the sound-field, and, specifically, on applying head-rotations to the reproduced binaural signal.

This chapter presents a detailed description of the Bilateral Ambisonics method, from HRTF representation to reproduction, including a possible solution for head tracking. The performance of the method is evaluated and compared with current state-of-the-art methods.

### **2. Basic ambisonics reproduction**

This section provides an overview of the currently used formulation for binaural reproduction using Ambisonics signals, denoted here as Basic Ambisonics. The binaural signal, which is the sound pressure observed at each of the listener's ears, can be calculated, in the general case of a sound-field composed of a continuum of plane-waves, by [7, 16]:

$$p^{L\backslash R}(k) = \int\_{\Omega \in \mathbb{S}^2} a(k, \Omega) h^{L\backslash R}(k, \Omega) \, d\Omega,\tag{1}$$

where *a k*ð Þ , <sup>Ω</sup> is the plane-wave amplitude density function, <sup>Ω</sup> � ð Þ *<sup>θ</sup>*, *<sup>ϕ</sup>* <sup>∈</sup> *<sup>S</sup>*<sup>2</sup> is the spatial angle in standard spherical coordinates, with elevation angle *θ* ∈½ � 0, *π* , which is measured downwards from the Cartesian *z* axis, and azimuth angle *ϕ*∈½ Þ 0, 2*π* , which is measured counter-clockwise from the Cartesian *x* axis in the *xy*-plane. *k* ¼ 2*πf =c* is the wave number, *f* is the frequency, and *c* is the speed of sound. *hL*n*<sup>R</sup>*ð Þ *<sup>k</sup>*, <sup>Ω</sup> is the left ear, *<sup>L</sup>*, or right ear, *<sup>R</sup>*, HRTF, which is the acoustic transfer function from a far-field sound source to the listener's ear [3]. *pL*n*<sup>R</sup>*ð Þ*<sup>k</sup>* is the sound pressure at the ear and Ð <sup>Ω</sup> <sup>∈</sup>*S*<sup>2</sup> ð Þ� <sup>d</sup><sup>Ω</sup> � <sup>Ð</sup> <sup>2</sup>*<sup>π</sup>* 0 Ð *π* <sup>0</sup> ð Þ� sin ð Þ*θ* d*θ* d*ϕ*.

Alternatively, the binaural signal can be calculated in the SH domain, leading to the Basic Ambisonics reproduction formulation [10]:

$$p^{L\backslash R}(k) = \sum\_{n=0}^{\infty} \sum\_{m=-n}^{n} \left[\tilde{a}\_{nm}(k)\right]^{\*} h\_{nm}^{L\backslash R}(k),\tag{2}$$

where *hL*n*<sup>R</sup> nm* ð Þ*k* are the SH coefficients of the HRTF, which can be computed by applying the spherical Fourier transform (SFT) to the HRTF, *<sup>h</sup>L*n*R*ð Þ *<sup>k</sup>*, <sup>Ω</sup> . *<sup>a</sup>*~*nm*ð Þ*<sup>k</sup>* are the Basic Ambisonics signals, which are the SFT of ½ � *a k*ð Þ , <sup>Ω</sup> <sup>∗</sup> , where ½ �� <sup>∗</sup> denotes the complex conjugate. These Ambisonics signals can be calculated by capturing the sound-field using a spherical microphone array, and applying plane-wave decomposition in the SH domain [14, 43].

In practice, the infinite summation in Eq. (2) will be order limited:

$$p^{L\backslash R}(k) = \sum\_{n=0}^{N} \sum\_{m=-n}^{n} \left[\tilde{a}\_{nm}(k)\right]^{\*} h\_{nm}^{L\backslash R}(k),\tag{3}$$

with *N* ¼ min ð Þ *Na*, *Nh* [44], where *Na* and *Nh* are the maximum available order of the Ambisonics signals and the HRTF, respectively. For example, when the Ambisonics signal is derived from spherical microphone array recordings, such as the Eigenmike [45], its order will be limited by the number of microphones; for the Eigenmike case with 32 microphones its order is around *Na* ¼ 4 [46]. A similar order limitation may also be introduced for a simulated sound-field in practical applications. On the other hand, Zhang *et al*. [17] showed that the HRTF is inherently of high spatial order. They concluded that for physically accurate representation up to 20 kHz, an order of above *Nh* ¼ 40 is required. Therefore, in the practical scenario of *Na* ¼ 4, the HRTF will be severely truncated by the reproduction order, *N* ¼ 4. This order truncation was shown to have a detrimental effect on the perceived spatial sound quality [18, 19], by affecting both the spectral and the spatial characteristics of the binaural signal.

### **3. Basic vs. ear-aligned HRTF representations**

An efficient representation of the HRTF that reduces its effective SH order could provide a solution for reducing the effect of the truncation error on the reproduced binaural signal, caused by the limited order HRTF.

Recently, several pre-processing methods have been developed with the aim of reducing the effective SH order of the HRTF: for example, by time-alignment [27, 30], using directional equalization [47], using minimum-phase representation [28], or by ear-alignment [29, 48]. All these methods are based on manipulating the linear-phase component of the HRTF, which was shown to be the main contributor to the high-order nature of the HRTF [27].

Ear-alignment has been shown to be a robust method for reducing the effective SH order of the HRTF, while preserving the HRTF phase information and the Interaural Time Difference (ITD) [29], which are both important cues for sound source localization [5]. The alignment is performed by translating the origin of the free-field component of the HRTF from the center of the head to the position of the ear. This translation significantly reduces the effective SH order of the HRTF, as described next.

#### **3.1 The effect of dual-centering on the basic SH representation of the HRTF**

We denote the SH representation of the HRTF as the" basic representation". In this section, the effect of translating the origin of the free-field component of the HRTF on the basic representation is presented. This is performed by analyzing the simple case of a" free-field HRTF" as outlined in [29].

A pair of far-field HRTFs, *hL* and *hR*, is defined as a function of direction, Ω, and wave-number, *k*, by [3]:

$$h^{L\backslash\mathbb{R}}(k,\Omega) = \frac{P^{L\backslash\mathbb{R}}(k,\Omega)}{P\_0(k,\Omega)},\tag{4}$$

where *P<sup>L</sup>* and *PR* represent the sound pressure at the left and right ears, respectively, and *P*<sup>0</sup> represents the free-field sound pressure at the center of the head in the absence of the head.

Now, consider a single plane-wave in free-field arriving from direction Ω with unit amplitude and wave number *k*. The sound pressure at position ð Þ Ω0,*r* , can be written as [49]:

$$\begin{split} P\_0(\Omega, k, \Omega\_0, r) &= e^{ikr\cos\Theta} \\ &= \sum\_{n=0}^{\infty} \sum\_{m=-n}^{n} 4\pi i^n j\_n(kr) \left[ Y\_n^m(\Omega) \right]^\* Y\_n^m(\Omega\_0), \end{split} \tag{5}$$

where Θ is the angle between Ω and Ω0, *Y<sup>m</sup> <sup>n</sup>* ð Þ� is the complex SH basis function of order *n* and degree *m* [50], and *j <sup>n</sup>*ð Þ� is the spherical Bessel function.

Defining the position of the ear to be at Ω*<sup>L</sup>*n*<sup>R</sup>*,*ra* � �, where *ra* is the radius of the head, the free-field HRTF (an HRTF with the head absent) is defined by substituting Eq. (5) in Eq. (4):

$$h\_0^{L \backslash R}(k, \Omega) = \frac{P\_0\left(\Omega, k, \Omega\_{L \backslash R}, r\_a\right)}{P\_0(\Omega, k, \Omega\_0, \mathbf{0})} = P\_0\left(k, \Omega, \Omega\_{L \backslash R}, r\_a\right),\tag{6}$$

where *P*0ð Þ¼ *k*, Ω, Ω0, 0 1 for all ð Þ Ω, *k* . Thus, for a sound-field composed of plane-waves from directions Ω ∈ <sup>2</sup> the free-field HRTF can be written as:

$$h\_0^{L/\mathbb{R}}(k,\Omega) = \sum\_{n=0}^{\infty} \sum\_{m=-n}^{n} 4\pi \vec{r}^n j\_n(kr\_a) \left[ Y\_n^m(\Omega) \right]^\* Y\_n^m \left(\Omega\_{L/\mathbb{R}}\right). \tag{7}$$

From here, the SH coefficients of the free-field HRTF can be derived, as presented in [29]:

$$h\_{nm\,0}^{L\backslash R}(k) = 4\pi i^{n} j\_{n}(kr\_{a}) \left[ Y\_{n}^{m} \left( \Omega\_{L\backslash R} \right) \right]^{\*}.\tag{8}$$

This equation provides insight into the potential effect of the dual-centering measurement process of the HRTF. The free-field HRTF coefficients, as described in the equation, have energy at every order *n*, which means that the HRTF is of infinite SH order. Nevertheless, it can be considered to be approximately order limited by *Nh* ¼ *kra*, where � is the ceiling function, due to the behavior of the spherical Bessel function, which has a negligible magnitude for *kr*> >*n* [49, 51]. On the other hand, from Eq. (6) it is clear that if the position of the ear was defined as the origin of the coordinate system, with *ra* ¼ 0, the free-field HRTF would be constant with unity value, which is represented by a zero order SH. This demonstrates how a sound pressure measured at a distance *ra* from the origin, when normalized by a sound pressure at the origin, can lead to an increase in the SH order by approximately *N* ¼ d e *kra* . An example of this added order is illustrated in **Figure 1**, which demonstrates how the SH order increases up to 30 at high frequencies.

Note the similarity of the orders in **Figure 1** to the actual order of the HRTFs as presented in [17], which suggests that although the explanation presented in this section is theoretical, it gives an insight into the possible increase in SH order due to the dual-centering of the HRTF definition.

#### **3.2 HRTF ear-alignment**

To compensate for the effect described in the previous section, with the aim of reducing the effective SH order of the HRTF, ear-alignment of the HRTF is suggested. The ear-aligned HRTF, *ha*, is defined in a similar way to Eq. (4) as:

$$h\_a^{L\backslash R}(k,\Omega) = \frac{P^{L\backslash R}(k,\Omega)}{P\_0^{L\backslash R}(k,\Omega)},\tag{9}$$

where *P<sup>L</sup>*n*<sup>R</sup>* <sup>0</sup> is the free-field pressure at the position of the left ear, *L*, or right ear, *R*, with the head absent. A measured HRTF can be aligned by translating the free-field pressure (denominator in (9)) from the center of the head to the position of the ear by:

$$h\_a^{L\backslash R}(k,\Omega) = h^{L\backslash R}(k,\Omega) \cdot \frac{P\_0(k,\Omega)}{P\_0^{L\backslash R}(k,\Omega)}.\tag{10}$$

For a far-field HRTF, the free-field sound pressure can be computed using the plane-wave formulation as in Eq. (5), which leads to the ear-alignment formulation:

$$h\_a^{L \backslash R}(k, \Omega) = h^{L \backslash R}(k, \Omega)e^{-ikr\_d \cos \Theta\_{L \backslash R}},\tag{11}$$

where Θ*<sup>L</sup>*n*<sup>R</sup>* is the angle between the direction of the ear, Ω*<sup>L</sup>*n*<sup>R</sup>*, and the direction of the sound source, Ω, and cos Θ*<sup>L</sup>*n*<sup>R</sup>* ¼ cos *θ* cos *θ<sup>L</sup>*n*<sup>R</sup>* þ cos *ϕ* � *ϕ<sup>L</sup>*n*<sup>R</sup>* � � sin *<sup>θ</sup>* sin *<sup>θ</sup><sup>L</sup>*n*<sup>R</sup>*. It is important to note that this ear-alignment process is invertible, which means that going from *hL*n*<sup>R</sup>* to *h<sup>L</sup>*n*<sup>R</sup> <sup>a</sup>* and back can be performed without any loss of information.

**Figure 2** presents an example of the SH spectrum of a KEMAR HRTF [26, 52], for the basic and ear-aligned HRTF representations. The SH spectrum, which is the energy of the SH coefficients at every order *n*, is computed as:

#### **Figure 1.**

*Added SH order due to the dual-centering of the HRTF, N* ¼ d e *kra , as a function of frequency (*d e� *is the ceiling function). Computed for the free-field HRTF with ra* ¼ 8 *cm and c* ¼ 343 *m/s.*

*Binaural Reproduction Based on Bilateral Ambisonics DOI: http://dx.doi.org/10.5772/intechopen.100402*

**Figure 2.**

*Normalized SH spectra, En, of basic and ear-aligned KEMAR HRTF representations, computed according to Eq. (12). The dashed gray line represents the order at which 99% of the energy is contained.*

and normalized by the maximum value for each frequency. The figure shows how the energy of the high-order SH coefficients of the ear-aligned HRTF is significantly reduced compared to the basic HRTF. This validates the finding from Section 3.1, in which the high orders of the basic HRTF actually originate from the translation from the origin. In particular, the order at which 99% of the energy is contained is reduced to be below order 10 for all frequencies.

It is interesting to note that the SH order reduction of the ear-aligned HRTF can explain the reduced order of the time-alignment method. This is discussed in detail in [26, 27]. The ear-alignment can be interpreted as" virtually" removing the inherent delay in an HRTF caused by normalizing the pressure at the ear by the pressure at the origin. This is evident from Eq. (11), where the phase in the exponent represents a delay from the origin to the ear due to a source at Ω. The main difference between the time-alignment and ear-alignment methods is as follows. Performing time-alignment requires numerical estimation of the time delays; this may be challenging and its accuracy may depend on the HRTF direction and on the quality of the measurements [53, 54]. In contrast, ear-alignment can be performed parametrically with the parameters *ra* and Ω*<sup>L</sup>*n*<sup>R</sup>*. Moreover, using the ear-alignment with fixed parameters makes it data-independent, which improves its robustness to measurement noise (as discussed comprehensively in [29]).
