Principle Component Analysis of Audio Data

2a38b495db7b9a16b9bec3e052eac3c5
0
Advoc 101 Nov 21, 2011 at 17:59

I’m having a bit of difficulty here.

I’m attempting to build a Finite Impulse Response filter to use in a Virtual 3D environment (for the sake of argument we should just call it a game), using a measured set of Impulse responses.

I’m trying to build a pair of Head-Related-Transfer-Function filters that can be used with headphones to help a player localize the 3d position of audio in the real world around him/her. It’s an illusion the brain perceives, and you can actually pinpoint an audio source “out-of-head” in the real space around you, rather than feel like its inside of your head. (consumer products would just call it 3D sound)

I have a set of measurements, Impulse responses, taken of a KEMAR dummy mannequin that was done in an anechoic chamber at MIT. There’s a library of pairs, one for each ear, of impulse responses for over 700 positions. They’re grouped by elevation, and there is a measurement for each multiple of 5 degrees on the azimuth. The elevations are set every 10 degrees starting at 0 degrees.

Principle Component Analysis uses a bunch of Stats methods that I’m not really all that familiar with, but I have a mathematician helping me with my research. But I understand a bit of Regression analysis and that will be a big part.

The impulse responses are in raw data audio format. Which is great for mapping and creating a convolution filter, but i dont ‘want to convolve the audio with the impulse reponses, I want to derive a mathematical function from the audio data. If we look at the measurements as a sample of a population, where the population would be any point between measurements, as well as the measurements themselves, then they act kind of like a standard deviation model. My big question for those of you who have done Principle Component Analysis before…. How the hell do I get raw Audio data set as some sort of Data Set for statistical analysis? I’m so stuck on this and I have no idea how to proceed.

3 Replies

Please log in or register to post a reply.

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 Nov 21, 2011 at 18:49

I don’t understand what you’re asking for. You have impulse responses for a large point cloud and you want to find a mathematical function that approximates them well as a continuous function of position, correct? What are you trying to do principal component analysis for? (Note: not “principle”. :)) And what is the “raw Audio data set” you’re talking about in your last sentence?

It seems like what you would want to do is work up a mathematical function with various parameters and use regression analysis to fit those parameters to the data. Or alternatively, pick a set of basis functions and compute a decomposition as a linear combination of them; spherical harmonics might be appropriate.

2a38b495db7b9a16b9bec3e052eac3c5
0
Advoc 101 Nov 21, 2011 at 19:15

I apologize for my awful articulation. I’m burning the candle at both ends and am currently fueled by monster soda.
@Reedbeta

You have impulse responses for a large point cloud and you want to find a mathematical function that approximates them well as a continuous function of position, correct?

YES

What are you trying to do principal component analysis for?

I need the principal components to use in a mathematical function to represent the main parameters of the audio. I don’t know how else to do this. I would like to create an F.I.R. filter ( a pair, one for each ear ). With the data in its current form I can only use a convolution filter to get the appropriate audio output, which is more taxing on the CPU and only allows discrete points rather than a continuous function of position as you put it.
All of the research I’ve done so far of the projects others have done point this direction.

And what is the “raw Audio data set” you’re talking about in your last sentence?

Semantics issue, I meant “set” as a verb. Poor word choice.

It seems like what you would want to do is work up a mathematical function with various parameters and use regression analysis to fit those parameters to the data.

This is backwards from how I understood Regression analysis to work. I thought you had the data expressed on a graph or something similar and you derive the mathematical function from a line of best fit?

Basically I have represent a bunch of impulse responses that look like this:

391136\_10150374709893613\_517473612\_8321018\_810245561\_n.jpg
In a way that I could do an analysis on it. Most of the articles barely say anything beyond “We did PCA on it” and the PCA tutorials have nothing audio specific, because its not strictly an audio process, and used mostly for stats.

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 Nov 21, 2011 at 21:16

Well, I don’t know all that much about PCA, but it doesn’t seem to me that it has anything to do with your problem. Maybe there’s some way of applying it that I don’t know about.

Regression analysis requires a model to start with. This is a mathematical function with a few unknown parameters. For linear regression the model would be a line (y = mx + :), and the parameters would be the slope and intercept (m and :). More generally you could use any functional form with a set of parameters, such as a polynomial (the coefficients would be the parameters), a Gaussian (mean and variance are the parameters), etc. This is nonlinear regression, which is harder than linear regression, but the idea in all cases is to find the parameter values that cause the function to match the data best. Regression analysis does not find the form of the function for you, though; you must decide that and then see how well the curve can be made to fit. If it doesn’t fit well you might try a different function.

The graphs you posted, are those time series data for a single source location? It looks like they might be well-approximated by a sum of sine waves multiplied by an exponentially decaying envelope, all translated in time to get the initial delay. A Laplace transform might help you here. Then, if you can come up with a model that describes an individual FIR well, you can try to find a meta-model that predicts the parameters for the FIR model based on azimuth and elevation angles. This is where spherical harmonics might come into play.