As the previous topic introduced, Convolution is absolutely one of the powerful tools we can use. It is not only can make your clean signal sounds like in a church but also it can create a virtual source that sounds like you just placed your sound somewhere around you. For this topic, we are gonna talk about creating the virtual sound source in headphones, which related to binaural rendering.
In this topic, we are trying to ‘placing’ a sound source at somewhere in a virtual space by convolving your sound with the HRTF that measured at the same position. First, we should talk about some related background about 3D audio
- Human Sound Localization
The picture above introduced ITD, ILD, and how the ear works like a ‘unique filter’ for each person.
ITD: Interaural Time Differences. This term is used to describe the time difference of a soundwave reach each ear. Usually, people perceived that just because the sound source is not at the center position to the listener so that lead to different distances for a sound transmitted to each ear.
ILD: Interaural Intensity Differences. This term is used to indicate the level difference between each ear due to the head shadowing effects. Which can lead to some loss of level, and also some high frequency with the wavelength that less than the diameter of listener’s head
Besides ITD and IID, there are many other factors also involved in such as those physical differences of the pinna, ear canal, torso, and son. Which finally create the unique filter for each person.
And those are just for the direct sound, still, you need to think about how reflections interact with your two ears in the same way. So right now, you probably can understand why it’s so hard to get a good resolution for the sound that perfectly fronts/above/behind you: ITD and IID won’t help with localization in this case.
- HRTF database
There are many databases online offered by those professional institutes like MIT and IRCAM. Basically, the database is a set of measurements that indicate how each ear response signals at different elevation and azimuth. Basically, they just use a dummy head microphone to simulate the two ears on head and record the impulse signal that comes from the specific direction. For example, usually the measuring position will be looked like a sphere around the dummy head; the elevation will be selected from 90° to -45°( Below can be ignored) with 15° decrement. And the same strategy on azimuth measurement for each elevation layer. As the result, a set of impulse signals that measured at multiple directions for both two ears can be used for the convolution later.
(A Helpful Printing from KEMAR, MIT)
After we get the HRTF database, we can generate the BRIR filters for both our two ears. Generally, you can treat this BRIRs(Binaural Room Impulse Response) as the filter of your ear that responding the sound from the different angles. For example, now you have two BRIR filter h1 and h2 for your left and right ears, which generated by the HRIR measured at 120° azimuth behind you; what you need to do is just convolve your dry signal separately with h1 and h2, then simply put them back to your L and R channel. As the result, you will experience an amazing auditory feeling that your dry signal sounds like comes from 120° behind you, which usually is the position of Right Surround Channel in Dolby 5.1 speaker layout. Similarly, if you want to create the virtual multichannel layout, for each virtual channel, you just repeat the processing above. And at the end, simply do the summation for each ear that adding all the signals together if they produced by same BRIR filter.
Here is a simple example of implementation in MATLAB, and it’s based on the database offered by IRCAM:
First, you can just pick up one of those databases, which contains 187 impulse response file. Run the load_HRIR_WAV.m script they offered that will generate the HRIR filter in a .mat file like this:
You can see there are two individual HRIR structures for both left and right ears. In each one of them, several properties will be used later:
Generally, what you have to do is just find the indices through elevation and azimuth vectors, which will return a single index num for content_m that corresponding to the HRIR filter at the specific position:
then, do the convolution on your dry signal with the HRIR you just found:
After concatenation, you will hear the sound comes from the virtual loudspeaker you just created. But if you generate the binaural audio in this way, remember to put on the headphone that is always required. For playback the binaural audio through the loudspeaker, you always need to do the crosstalk cancellation which will be explained in detail in the following post later.
Here is a mosquito sound after binaural processing: