**4. Binaural reproduction based on bilateral ambisonics and ear-aligned HRTFs**

While the ear-alignment method leads to efficient SH representation of the HRTF, incorporating the pre-processed ear-aligned HRTF in a binaural reproduction process is not trivial. The computation of the binaural signal (Eq. (3)) requires the HRTF and the Ambisonics signals to be represented in the same coordinate system and around the same origin. One way to align them is to re-synthesize the HRTF phase before the computation of the binaural signal, which will increase its order back to the original high order, and will cause similar truncation error to that in the Basic Ambisonics reproduction. Another way is to use the MagLS approach, which completely ignores the HRTF phase component at high frequencies [31]. Alternatively, the Binaural B-Format approach, presented by Jot *et al.* [38], can be used. In this approach, two B-Format recordings at the ear locations are used, together with a minimum-phase approximation of the HRTF and an ITD estimation based on a spherical head model. The Binaural B-format can be extended by using the ear-aligned HRTF together with high-order Ambisonics signals that are defined

#### *Advances in Fundamental and Applied Research on Spatial Audio*

around the ear locations. This approach is denoted as Bilateral Ambisonics reproduction [41, 42].

Assuming that the plane-wave amplitude density function, denoted by *<sup>a</sup>L*n*R*ð Þ *<sup>k</sup>*, <sup>Ω</sup> , is given at the position of the ear, then the binaural signal can be computed directly at the listener's ears, using the ear-aligned HRTF, similarly to in Eq. (1):

$$p^{L\backslash R}(k) = \int\_{\Omega} a^{L\backslash R}(k,\Omega) h\_a^{L\backslash R}(k,\Omega) \Omega. \tag{13}$$

From here, the Bilateral Ambisonics reproduction of order *N* can be formulated as:

$$p^{L\backslash R}(k) = \sum\_{n=0}^{N} \sum\_{m=-n}^{n} \left[\tilde{a}\_{nm}^{L\backslash R}(k)\right]^{\*} h\_{nm}^{L\backslash R}(k),\tag{14}$$

where *a*~*<sup>L</sup>*n*<sup>R</sup> nm* ð Þ*<sup>k</sup>* and *<sup>h</sup><sup>L</sup>*n*<sup>R</sup> a nm*ð Þ*<sup>k</sup>* are the SH coefficients of *aL*n*<sup>R</sup>*ð Þ *<sup>k</sup>*, <sup>Ω</sup> � � <sup>∗</sup> and *hL*n*<sup>R</sup> <sup>a</sup>* ð Þ *k*, Ω , respectively. It is important to note that, in contrast to *a k*ð Þ , Ω , which is the planewave amplitude density function of the sound-field as observed at the position of the center of the head, *aL*n*<sup>R</sup>*ð Þ *<sup>k</sup>*, <sup>Ω</sup> is observed at the position of the ears. **Figure 3** demonstrates the differences between the two coordinate systems. The standard coordinate system, denoted by black dashed axes with its origin at the center of the head, is used for the computations of the binaural signals in Eqs. (1) and (3) using the Basic Ambisonics signals, *a*~*nm*ð Þ*k* , for both ears. The bilateral coordinate systems, denoted by red dotted axes with their origin at the positions of the ears, are used for the computation in Eq. (14) using the Bilateral Ambisonics signals, *a*~*<sup>L</sup>*n*<sup>R</sup> nm* ð Þ*k* , which are different for each ear. **Figure 4** demonstrates the signal-flow of the Basic and Bilateral Ambisonics.

Theoretically, the plane-wave amplitude density function at the position of the ear can be computed from the center function by translation of the sound-field [46], which is computed as *<sup>a</sup><sup>L</sup>*n*<sup>R</sup>*ð Þ¼ *<sup>k</sup>*, <sup>Ω</sup> *a k*ð Þ , <sup>Ω</sup> *<sup>e</sup>ikra* cos <sup>Θ</sup>*L*n*<sup>R</sup>* ; however, this will lead to equivalence between Eq. (13) and Eq. (1), which means that the binaural signals

#### **Figure 3.**

*Diagram of the standard (a) and Bilateral (b) coordinate systems. The origin of the standard coordinate system is at the center of the head, while in the bilateral coordinate system the origin is at the position of the ear.*

**Figure 4.** *Binaural reproduction signal-flow of the Basic (a) and Bilateral (b) Ambisonics.*

will be identical. Thus, the same truncation error as in the Basic Ambisonics reproduction is introduced. Alternatively, if a low-order plane-wave amplitude density function is given directly at the position of the ear, the Bilateral Ambisonics-based signals (from Eq. (14)) may potentially be more accurate than the Basic Ambisonics reproduction (from Eq. (3)) due to the lower-order nature of the ear-aligned HRTF compared to the unprocessed basic HRTF.

**Figure 5** demonstrates the improved accuracy of the Bilateral Ambisonics reproduction. The figure shows the magnitude response of the binaural signals for a single plane-wave of unit amplitude arriving from direction ð Þ¼ *<sup>θ</sup>*, *<sup>ϕ</sup>* <sup>90</sup><sup>∘</sup> , 20<sup>∘</sup> ð Þ, using a KEMAR HRTF, with *N* ¼ 1, 4, and a high-order reference of *N* ¼ 40. For the low-order signals computed using Basic Ambisonics reproduction, a highfrequency roll-off above the sphere cut-off frequency, *kra* ¼ *N* [14], is clearly observed. This is discussed further in [19]. Additionally, amplitude distortion is also observed at these high frequencies. The Bilateral Ambisonics-based signals seem significantly more accurate in terms of both frequency roll-off and distortion, where reproduction of order *N* ¼ 4 seems to preserve the signal magnitude up to almost 20 kHz, including the important spectral cues (peaks and notches). Further evaluation of the performance of the Bilateral Ambisonics reproduction is presented in Section 6.

**Figure 5.**

*Magnitude of a left ear binaural signal of a single plane-wave from direction* ð Þ *<sup>θ</sup>*, *<sup>ϕ</sup> <sup>=</sup>* <sup>90</sup><sup>∘</sup> , 20<sup>∘</sup> ð Þ*, with HRTF of KEMAR, computed with Basic Ambisonics reproduction (solid lines) and with Bilateral Ambisonics reproduction (dashed lines), with N* ¼ 1, 4*, compared to a high-order reference with N* ¼ 40*.*

## **5. Head-tracking in bilateral ambisonics reproduction**

While Bilateral Ambisonics leads to a more efficient representation of the spatial audio signal and more accurate binaural reproduction, such a procedure will result in a static binaural reproduction. In contrast to the Basic Ambisonics reproduction, where head-rotations can be incorporated in post-processing by a simple rotation of the Ambisonics signals using Wigner-D functions [55], performing this operation in Bilateral Ambisonics is not straightforward. A method to incorporate head-rotations in Bilateral Ambisonics reproduction is presented in this section.

Consider the specific case where a binaural signal is played via headphones to a listener, representing a spatial acoustic scene composed of a single sound source. According to the Bilateral Ambisonics format, the scene is represented by two Ambisonics signals with their origin at the listener's expected ear positions, as seen in **Figure 6a**. Note that the microphone symbols in **Figure 6** represent the left and right Ambisonics signals origin. Upon playback of the acoustic scene, the listener is expected to perceive a virtual source from the direction of the real source (in this example about 30<sup>∘</sup> to the left). Next, the listener rotates his/her head while

#### **Figure 6.**

*Demonstration of the head-tracking method in bilateral coordinate systems, (a) before the head-rotation, (b) after head-rotation and without head-tracking and (c) after head-rotation and with head-tracking. r*! *<sup>a</sup> and r*! *b are the head-rotation ear vectors before and after head rotation, respectively.*

#### *Binaural Reproduction Based on Bilateral Ambisonics DOI: http://dx.doi.org/10.5772/intechopen.100402*

listening; this action will result in the virtual source changing its position in space, remaining at about 30<sup>∘</sup> to the left, as illustrated in **Figure 6b**. One way to compensate for the head rotation is to acquire new Bilateral Ambisonics signals located at the listener ears' new locations, and also to rotate them according to the angle of rotation of the listener's head, as illustrated in **Figure 6c**. This, of course, may not be a practical option since acquiring new Bilateral Ambisonics signals requires resynthesizing the sound-field, in the case of a simulation, or adjusting the position of the physical microphone arrays, in the case of sound field capture. The former may be computationally expensive, while the latter is practically infeasible since recording is typically performed independently from the listener's head orientation. Note that a multi-microphone binaural recording method could be employed, similar to the Motion-Tracked Binaural recording method [56], though this is solution will be complex in terms of the recording resources. Hence, developing methods that compensate for the listener head movements using head-tracking is of great importance for Bilateral Ambisonics recording and reproduction. **Figure 6** shows that as a result of the rotation in this case, the ears (which are the Ambisonics reference point) change their orientation while also translating in space. Proper compensation for head-rotation needs to take both movements into account.

Now, consider the general case, where an arbitrary sound-field is represented by a plane-wave amplitude density function, denoted by *a k*ð Þ , Ω , given at the position of the ear. Note that *a k*ð Þ , <sup>Ω</sup> represents the same function as *aL*n*<sup>R</sup>*ð Þ *<sup>k</sup>*, <sup>Ω</sup> from Eq. (13), but the superscript *<sup>L</sup>*n*<sup>R</sup>* is left out for notation simplicity since the operation is similar for both ears. Assuming that the ear position with respect to the head center is known before and after the rotation, denoted as *r* ! *<sup>a</sup>* and *r* ! *<sup>b</sup>*, respectively, head-tracking can be performed by translation of the plane-wave amplitude density function *a k*ð Þ , Ω , accordingly. This translation can be performed by a phase-shifting operation, as follows [46, 57]:

$$a^t(k,\Omega) = a(k,\Omega)e^{-i\vec{k}\cdot\left(\vec{r}\_b - \vec{r}\_a\right)} = a(k,\Omega)e^{-ikr\_d(\cos\Theta\_b - \cos\Theta\_b)},\tag{15}$$

where *a<sup>t</sup>* ð Þ *k*, Ω is the translated plane-wave amplitude density function, which represents the plane-wave amplitude density around the ear of the listener after

head-rotation (but with the pre-rotation orientation). *ra* ¼ ∣*r* ! *<sup>a</sup>*∣ is the head radius, *k* ! is the wave vector, Θ*<sup>a</sup>* is the angle between the sound source direction, Ω, and the pre-rotation ear position, *r* ! *<sup>a</sup>*, and Θ*<sup>b</sup>* is the angle between Ω and *r* ! *<sup>b</sup>*. **Figure 7**

#### **Figure 7.**

*Schematic illustration for left ear microphone array translation due to head rotation: r*! *<sup>a</sup> is the left ear vector with respect to the head center, r*! *<sup>b</sup> is the left ear vector after a clock-wise rotation, k*! *is the wave vector of the plane-wave, where* Θ*<sup>a</sup> and* Θ*<sup>b</sup> represent the angle between the ear vector and the wave vector.*

demonstrates this translation for a simple case where the sound-field is comprised of a single plane-wave and the microphone symbols represent the measurement position of *a k*ð Þ , Ω made by microphone arrays.

Next, the orientation of the translated plane-wave amplitude density function is corrected by applying rotation. This is formulated in the SH domain by:

$$a\_{nm}^r(k) = \sum\_{m'=-n}^n a\_{nm'}^t(k) D\_{mm'}^t(a, \beta, \gamma), \tag{16}$$

where *a<sup>t</sup> nm*ð Þ*<sup>k</sup>* are the SH coefficients of *at* ð Þ *<sup>k</sup>*, <sup>Ω</sup> , *<sup>D</sup><sup>n</sup> mm*<sup>0</sup> denotes the Wigner D functions, and *a<sup>r</sup> nm* are the rotated Bilateral Ambisonics signals. ð Þ *α*, *β*, *γ* are the Euler angles [58] of the head-rotation, which are assumed to be known, for example from a head-tracker. Note that this procedure needs to be applied to both left and right ears.

In practice, the Bilateral Ambisonics signals will be order limited due to the constraints mentioned in Section 2. The finite order representation, in turn, leads to limitations in the accuracy of the suggested method. These limitations will be presented and demonstrated in numerical simulations in Section 6.
