View Full Version : Visimes vs Phonemes?
joeedh 08-14-2007, 11:07 PM Hi. In the book Stop Staring (by Jason Osipa) the auther discusses how in traditional animation, phonetic-based mouth shapes are used, while in 3d animation a more generalized system based on a superset of "visimes" is more useful. I idea I believe is that mouth shapes during talking is fairly relative, with different shapes blending together, acting in not exactly obvious ways and in other words not well suited for a phonetic-shape breakdown (unless you're drawing cartoons frame-by-frame, in which case is works).
So visimes are a system that tries to capture the subtler effects of speech, and also offering the artist more control, while phonemes is more of a technical method. Which is better? I'm still working on the shape keys for my rig (based on the stuff in Stop Staring) so I havn't tried it myself. And I've only dabbled very lightly in phonetic-based syncing, a long time ago.
Joe
|
|
One of those big questions that will get all kinds of answers. I have used both methods and both have their advantages. Stright up phonemes are easier and faster to create in some ways. They don't really have to work closely with each other as you tend to just go from one to the other. I refer to this as absolute targets, if you need a "O" sound that is what you create. The second method I refer to as blended targets. I belive this is the better system but it also takes more time to get it right. Each of the targets has to blend in to work with others to get any given shape that is needed. You need to do lots of testing to get it to work. If I'm doing a very fast project like a TV commercial where you know exactly what the character is going to say then an absolute system might get you what you want. You can even be adding the targets as you are animating. Just create the shape that you need and add it. This works well with limited facial animation needs. Blended systems get the animator a more flexable facial system that doesn't force them to use the same targets over and over. It will look more natural when animated as well. The time for setup isn't that bad once you have done it a couple times but does need testing as you create it.
Both can produce great results.
Ruramuq
08-15-2007, 01:02 PM
I'm not familiarized with the term visimes, but it seems more logical to blend phonemes and simplify them, otherwise using phonemes literally/absolutely, could be useful to give emphasis to an expression, and make it characteristic, but as far as I know it is not possible to animate using only phonemes, because of the framerate / 24fps / 30fps. and that would result rare for most cases.
the speed of speaking matters, speaking slowly, gives time to shape better the mouth
joeedh
08-16-2007, 02:44 AM
In this book I'm using actually, the author actually wrote a script to cut shapes into different
component shapes, so you start out making really basic poses (like a smiling shape, frown
shape, sneer shape, and some basic phonetic shapes for the mouth) then you break them
into smaller pieces.
The script basically works by inverting weightmaps. The idea is you paint the influence weight of one half of a blendshape, then it duplicates the shape and inverts the weightmap. This results in two shapes that always *perfectly* blend together. Kindof interesting idea.
Joe
LucentDreams
08-16-2007, 03:18 AM
Visemes leaves the animator a little more flexibility, but phenomes are handy particularly for quicker production work in that the lipsyncing formula's apply. Speaking as a traditional animator I can listen to an audio clip and jot down the entire lipsync on a paper dopesheet so quick that using phenomes will get an okay job done in minutes. Overall though visemes is the more robust setup, means that AHHH can appear slightly different depending ont he characters mood better etc. If its any project where your going to use the rig over and over and he needs to remain highly usable, then I say the viseme approach is better.
JasonOsipa
12-11-2007, 07:29 PM
To ask Visimes v Phonemes can be a misleading question in that it can create a pairing of technique and technology or rigging and animating, when really, they are all different things.
Visimes can do anything Phonemes can do, but the reverse is not true, so in a technical aspect, Visimes are "better". Phonemes can provide a method of interaction, or workflow, really, that many people prefer, but that is workflow or interaction, not technology. Since visimes can be used together to create phonemes, through the use of a pose library or augmented control scheme, the question of better/worse becomes largely moot, as it really all becomes about the UI you want. To take visimes, and create a phoneme pose library is trivial.
So the question of which is "better" really breaks into two major questions (and a spadillion minor ones)
1) which is better to build and rig?
2) which is better to animate with?
My own personal thoughts on 1) is visimes (using taper techniques) just because it's SO much faster to do, and gives you more options at the back end, with fewer fix shapes needed.
My own personal thoughts on 2) is that it is all preference and project goals and timelines. Sometimes I prefer pulling on pose libraries (that include phonemes), and other times, I prefer sculpting each bit of sync and expression as I go.
Thanks for droping in Jason, since your post I'm a little confused and couldn't find reference in your book to help me out, just how many zero's are in a spadillion?
Great explanation of your choices.
Im in the same camp as Jason on this, for doing quiet a bit of research first on phonemes, muscle action units and visemes - im coming to the conclusion that visemes are essentially more powerful than the the first too. Because essentially you can build the correctives and phonemes through the visemes.
A case in point, taking 7 action units (i.e muscle poses) of the face you need 30 correctives. Now this is based on Paul Elkmans research and his specific rules. But if you didnt follow this and uses say 6 targets all able to blend to each other and there combinations youd need 63 correctives.
For 9 base shapes you'd need a staggering 501 correctives - this is following the math 'choose' basis for the rules. so:
9 choose 2
9 choose 3
9 choose 4 etc, etc.
The conclusing im coming to is the determination of rigging based on fidelity of the rig - the higher the detail the more sway going towards either a bone per vert or a morph corrective system. The less verts the more leverage of rigging controls to detail of face - i.e you can do more muscle, bone based rigging the lower the detail.
fidelity of controls = (fidelity of movement / Level of detail)
CGTalk Moderation
12-12-2007, 04:49 PM
This thread has been automatically closed as it remained inactive for 12 months. If you wish to continue the discussion, please create a new thread in the appropriate forum.
vBulletin v3.0.5, Copyright ©2000-2012, Jelsoft Enterprises Ltd.