HRTF engine?

Advoc 101 Mar 23, 2010 at 22:26

Has anybody here played around with the concept of a sound engine that uses Head-Related Transfer Function?

The concept of HRTF is that your brain can determine the direction and distance of a sound based on the amplitude AND phase discrepancies between what each ear hears. Most of your stereo mixes for music are solely amplitude panning, and I find that any of the games that I play are the same thing.

I’ve been playing around with the idea of writing a matrix of algorithms to calculate the phase difference of each ear using HRTF formulae and being able to use it as a sound rendering engine for a 3d game (FP would be the best way to utilize this) of course it would require the user to have a set of headphones because it wouldn’t work with speakers.

I thought I’d prototype this in Max MSP and see if I could compile it to an application that would allow you to spatialize in 3 dimensions a mono sound source and then try to re create the same program using a traditional programming language for use in whatever application may utilize it.. (kind of like havoc for physics) has anybody seen or heard anything about this? or is anybody interested in the idea?

i need some feedback.

If you’re curious about what the heck i’m talking about google “virtual haircut”, download the mp3 and give it a listen with some headphones (not earbuds)

18 Replies

Please log in or register to post a reply.

Reedbeta 167 Mar 23, 2010 at 23:28

That sounds like a very interesting idea. I haven’t heard of something like that being done in a game before, but then I don’t know much about sound processing. Most games do have a pretty limited amount of CPU budget to apply to sound, so I wonder if you’d run into performance issues trying to do the HRTF processing in real time. Still, it’s worth it to do a mockup and see what the effect is like.

kvakvs 101 Mar 24, 2010 at 02:03

How are you going to get user head orientation for this?

Reedbeta 167 Mar 24, 2010 at 02:42

I’d presume it would just be whatever direction the camera is facing. Not tracking the user’s physical head. ;)

kvakvs 101 Mar 24, 2010 at 03:02


I’d presume it would just be whatever direction the camera is facing. Not tracking the user’s physical head. :)

Tracking head orientation would be mad. Imagine turning head while listening to enemy steps or ambience sounds, could add a lot to game immersion.

There should be projects which try to read web-camera input to track if user turned his head. Examples are XBOX games, tracking user face or silhouette, which require calibration before use (get out of camera, game makes shot, stand in camera, make another shot, then use difference between frames to find contour of player.

Reedbeta 167 Mar 24, 2010 at 03:42

There’s definitely been a good deal of research on head tracking, so I bet there’s some open-source stuff out there. If you’ve got a webcam it’s certainly an interesting direction to pursue.

Advoc 101 Mar 24, 2010 at 08:12

actually the head tracking would be the easy part, I spoke about that with my professor, compared to getting the HRTF stuff working the way it should it would be the easier task, you’d just need sensors on the head phones that would change the position of the ears, rotate the matrix to match it.

This would be an iterative process, and I might end up spending much of my spare time the next couple years at university doing it.

But I wasn’t worried to much about the movement because if the application is a 3d game, your eyes are always fixed on the screen, but that would be a different story if you had a VR helmet, as an example.

The other hurdle is that HRTF as a concept is universal, but the formulae aren’t, because every person’s HRTF is different. We all have different shaped heads and our brains work differently in interpreting those phase and amplitude cues. What the HRTF researchers use is an average value that should work for most people. I think to get around that I’d have to make a completely new set of algorithms to change the matrix of algorithms depending on the measurements of your head. or to make it simpler, just have a few different presets.

It works fine for me, but one person I spoke to said that if the sound passes anywhere the illusion works for them unless it goes straight in front, then it just sounds like a stereo mix rather than a phantom sound source in front of them. It could be completely different for each person.

Kenneth_Gorking 101 Mar 24, 2010 at 11:48

Hasn’t HRTF been supported by soundcards, since 2000 or something? Or are you talking about something more elaborate?

Advoc 101 Mar 24, 2010 at 14:27

I think the hardware HRTF support in those soundcards is mostly for rendering multi channel audio into a proper binaural mix. What I’m talking about is different.

If you think of your typical consumer audio standards you can add dimensions to them.

If we ignore the cues that we mentally process that tells us things like depth (like a reverberant recording will give us a mental cue as to a sound that is farther away than another etc,) because technically there is no spatialization there. its just a mental trick.
Mono would be 0 dimensions. it is just a point. all the sound from one source.
Stereo would be 1 dimension, because you can plot the sound source on one line.
4.1, 5.1 6.1 7.1 … etc would be 2 dimensional because you can plot a sound in X,Y but you have no height. You would have to add an elevated speaker array to get a true 3 dimensional representation of the sound.

So rather than trying to render a 5.1 mix as a stereo binaural mix that uses HRTF. which would probably give you a half decent illusion of where things were in 2 dimensions. you still loose that 3rd dimension.. which to me would probably ruin the entire effect, because sound doesn’t work that way.

What I’m talking about is taking a mono sound and being able to position it somewhere in 3 dimensions while only using a stereo feed, which in this case would have to be headphones.

I suppose if there are soundcards that will calculate HRTF in realtime then that would basically already do what I’m talking about. But it would have to be able to read information from the application as far as position of sound, the geometry of its surroundings, the makeup of its surroundings etc. and simulate that 3d space for your brain.

So I’m talking about doing that \^\^\^ at a software level.

and do develop an application that would allow you to do it realtime, or to render it to a file.

JarkkoL 102 Mar 24, 2010 at 14:55

Audio driver would only need to get the 5.1 output from application and down sample it to two channels with HRTF, so there isn’t any special integration that would be required, though it would of course give better results not to go through 5.1. I don’t know how surround headphones work, but I think there could be something like this going on, so it might be worth to check them out.

Anyway, it would be cool to have head tracking with HRTF in an FPS (: Even though you look the screen straight most of the time, you can still turn head left/right \~30 degrees while still looking at the screen. So it would be extra natural control for checking audio cues around you by turning your head around (:

Advoc 101 Mar 24, 2010 at 15:37

I don’t know exactly what surround headphones are, but if HRTF is done right you don’t need special headphones, you just need half decent quality ones. When I first listened to that silly virtual haircut thing I kept turning to my left to see what the guy next to me was trying to say to me. I thought it was his muffled voice that I couldnt’ hear well because of the head phones. turns out it was the guy in the recording mumbling to himself in the corner. If you’ve got a really good HRTF signal, when you wear the headphones it should sound to you like they are not even there, and that the sounds you hear are coming from the room around you rather than from the headphones you are wearing. but the idea behind it doesn’t need special surround sound headphones.

Anyway, it would be cool to have head tracking with HRTF in an FPS (: Even though you look the screen straight most of the time, you can still turn head left/right \~30 degrees while still looking at the screen. So it would be extra natural control for checking audio cues around you by turning your head around (:

good point.
My prof is actually excited about this aspect of it and I’ve got all the tools of the university available to me.

JarkkoL 102 Mar 24, 2010 at 15:52

Yes, I know you don’t need special headphones, I checked the virtual barber with my regular ones after all, which worked great for me (: The point was that surround headphones might actually have HRTF built into the hardware, so it might be good to check them out and maybe if you can find some more info. I would be interested to know what’s the math behind HRTF to maybe implement it one day to my game engine (:

Advoc 101 Mar 24, 2010 at 16:43

Oh I see what you’re saying.

I’ll share what I find.

jirka 101 Apr 10, 2012 at 06:30

This is a subject I have been pondering myself for the last month or 2, especially its application to head tracking. Since it seems at this point that others may be better suited to developing the technology I will share my ideas and discoveries here.

First up, is Slab3d performs spatial 3D-sound processing allowing the arbitrary placement of sound sources in auditory space. It is released free to the public under the NASA Open Source Agreement.
I just found this about an hour ago but my initial testing with it shows promise.

My first suggestion is looking into using the microsoft kinect for both head tracking, and possibly even head scanning for personalised hrtfs. With some (admittedly difficult) work, this could be a very slick and user friendly solution.

My second suggestion for further development after getting the head tracking and audio processing working, is to replace the headphones with a pair of targeted parametric speaker arrays, such as At this point we would already have ear coordinates and may be able to exploit our existing dsp framework to neutralise the newly introduced natural hrtf effects and replace them with those from the virtual environment. The use of a microphone to capture and compensate for room response would also be useful.

My dream would be to see an open standard/api/whatever to make fully immersive head-tracked virtual audio easy to incorporate into any software/hardware product. Games are the obvious choice for this technology but it could also be used to improve immersion and lower fatigue when listening to music with headphones, or any other application where a virtual audio environment would be useful.

Good luck with your research and development, I would have loved to have worked on this myself but in truth I don’t really have the programming or academic experience nescessary for something like this. So here I am setting free my ideas into the public domain, where I pray they will blossom and prosper with the help of others. Please keep us up to date on your progress.

Stainless 151 Apr 10, 2012 at 08:27

3D audio has been around for ages, and works very well.

Head tracking can be done with a webcam and some software, you run a few filters on the input image and reduce it to a black and white image with blobs for the eyes nostrils and mouth. That’s the only tricky bit, the filters can be affected by ambient light conditions and give false readings if you are not careful.

As far as I remember it I did….

Background removal, grey scale, rectify, sharpen, blob detect.

I may have missed a filter, was a few years ago now. I stopped working in this field when the company ran out of money to pay me.

The other thing you can try is a 3D camera. They are not that expensive anymore and give excellent results.

alphadog 101 May 02, 2012 at 13:31

Why head-track with a webcam? What of people without one?

You could check time of arrival and time difference of arrival relative to the sound source. Grab a baseline by asking the user to be steady relative to screen, sample, then calibrate by asking them to turn head, etc. Thinking out loud here…

Or, lots of phones have accelerometers. Ask them to calibrate by putting their phones on their heads. Hmm, excuse me while I go visit the trademark and patent offices… :)

geon 101 May 02, 2012 at 14:22

Unless you have a head-mounted display, why head-track at all? You’ll be staring straight ahead at the monitor all the time.

(I know som of the VR-helmets of the early ’90s had integrated head tracking. Possibly via magnetic field sensors or something.)

Stainless 151 May 02, 2012 at 14:29

Yes, I think they used FET’s

The headset was so heavy though, and the strap attachment had to be so tight it was akin to torture.

There are some very good direct injection headsets now, though they tend to generate a virtual screen some distance in front of you.

rouncer 103 May 06, 2012 at 04:14

its a simple idea, the head tracking is the hardest bit. its just phase angles 30 samples and less or something.