**1.1 Head-related transfer function (HRTF)**

Head-Related Transfer Function (HRTF) is defined as an acoustic transfer function from sound acquired at a center point when a listener is absent to that acquired at the listener's ear [1] in a free field (a field without any reflection). A sample of its illustration is depicted in **Figure 1**. As in **Figure 1(a)**, a microphone is located at the center of a subject's head with the subject absent. The output *Y*Að Þ*z* is obtained as the response to the input *X z*ð Þ by using the *z*-transform as follows:

$$\mathbf{Y}\_{\mathbf{A}}(\mathbf{z}) = \mathbf{M}(\mathbf{z}) \cdot \mathbf{H}\_{\mathbf{A}}(\mathbf{z}) \cdot \mathbf{S}(\mathbf{z}) \cdot \mathbf{X}(\mathbf{z}),\tag{1}$$

where *M z*ð Þ and *S z*ð Þ are system functions corresponding to the microphone and loudspeaker, respectively. As in **Figure 1(b)**, the same microphone is located at the

**Figure 1.**

*Definition of head-related transfer function. (a) Obtaining the response Y*Að Þ*z at the center of a subject's head with the subject absent. (b) Obtaining the response at the ear Y*Eð Þ*z* .

subject's ear. The output *Y*Eð Þ*z* is also obtained as the response to the same input *X z*ð Þ fed to the same loudspeaker as follows:

$$\mathbf{Y}\_{\mathbb{E}}(\mathbf{z}) = \mathbf{M}(\mathbf{z}) \cdot H\_{\mathbb{E}}(\mathbf{z}) \cdot \mathbb{S}(\mathbf{z}) \cdot X(\mathbf{z}). \tag{2}$$

the *z*-transform of HRTF, *H z*ð Þ, is acquired from *Y*Að Þ*z* and *Y*Eð Þ*z* as follows:

$$H(\mathbf{z}) = \frac{\mathbf{Y}\_{\mathbf{E}}(\mathbf{z})}{\mathbf{Y}\_{\mathbf{A}}(\mathbf{z})} = \frac{H\_{\mathbf{E}}(\mathbf{z})}{H\_{\mathbf{A}}(\mathbf{z})}. \tag{3}$$

Computation of Eq. (3) eliminates the system functions of *M z*ð Þ and *S z*ð Þ when the same microphone and loudspeaker are used for the acquisition of the HRTF, except the case that either of these system functions has zeros. The HRTF is obtained as *H z*ð Þj*<sup>z</sup>*<sup>¼</sup> exp ð Þ *<sup>j</sup><sup>ω</sup>* where *<sup>j</sup>* is imaginary unit, *<sup>ω</sup>* <sup>¼</sup> <sup>2</sup>*π<sup>f</sup>* is the angular frequency and *<sup>f</sup>* is the frequency. Time domain representation (impulse response) corresponding to *H z*ð Þ is called as the Head-Related Impulse Response (HRIR).

The HRTF varies due to the sound source position and has strong individuality in both objective and subjective senses. Therefore a set of HRTFs is ideally acquired individually in all sound source directions. While a study considering the efficient sampling scheme of the HRTF measurement exists [2], data size of such set of HRTFs may become numerous. There also exist many datasets involving the HRTFs (HRIRs) of multiple subjects in multiple sound source directions [3–6], but the individualization using these datasets seems difficult.

### **1.2 Virtual auditory display (VAD) utilizing head-related transfer functions**

Virtual Auditory Displays (VADs), which is a device or an equipment for presentation of an audition in certain sound field to a listener, have been developed since
