simple lip synching
Posted 02 August 2010 - 07:45 PM
Obviously this falls down for "th" "m" when the amplitude is high but the mouth is shut... whats the next step after this before it starts getting impossible to understand? :)
Posted 02 August 2010 - 08:05 PM
I spose I could try that.
Posted 02 August 2010 - 09:16 PM
edit: these are older coursenotes, but they appear to follow the approach I described: http://www.kbs.twi.t...labi/speech.pdf
Posted 02 August 2010 - 10:06 PM
Anyway, as roel says it's not a trivial subject, but it is also a very well researched one, and thankfully a lot easier than full speech recognition. I'm sure you can find some papers online.
Finally, I believe the amplitude version was used in Half-Life 1, FWIW.
Posted 03 August 2010 - 11:09 AM
What ive got in mind, is first recording all the "visemes" I wish to use, get an fft slice of each one.
Then it analyzes the wave form and for each slice itll do an amplitude and phase check on each bin of each viseme, then itll pick the viseme of the least difference.
So ill report back how successful this is in the near future.
Posted 03 August 2010 - 04:16 PM
Posted 03 August 2010 - 04:21 PM
Posted 03 August 2010 - 04:26 PM
Well your right if you want AAA animation, but if your happy with something a little less than perfect (yes me...) then speech recognition needs to be more precise (Cause its all about recognizing a phoneme no matter WHO says it... something im NOT going to bother with), unless your going for walt disney animation... then I guess I see your point.
Posted 03 August 2010 - 07:10 PM
Speech rec is harder because it must discriminate among a greater number of phonemes and then also put those together into words. Lipsync doesn't require extracting words. I'm not sure what you mean by including emotion? If you mean lipsynching while smiling/scowling/whatever, that can probably be done via additive or blended animations. (I'm assuming the emotion parts would be hand-animated, not extracted from the tone of voice automatically - although that is also an interesting problem!)
Posted 03 August 2010 - 07:18 PM
Because, if you have predetermined tracks, which is predominantly the case for games, then there are various ways to markup audio to articulate and sync the animations.
Obviously, if you are allowing users to input a wave and you want to lipsync an avatar to it, then that gets tricky as the other have illustrated. But, it may be useful to re-think the question before looking at the answers...
Posted 03 August 2010 - 07:58 PM
Posted 04 August 2010 - 03:05 AM
Posted 04 August 2010 - 10:54 AM
I might be half relying on this when my phoneme detector isnt working at all. hehe
Manually animating the face for every single bit of the spoken wave is something im trying to avoid, even though I know thats another solution, but a work heavy one.
Posted 04 August 2010 - 11:07 AM
1 user(s) are reading this topic
0 members, 1 guests, 0 anonymous users