Unsupervised Learning of Style-Aware Facial Animation from Real Acting
Performances
- URL: http://arxiv.org/abs/2306.10006v3
- Date: Fri, 1 Sep 2023 18:08:05 GMT
- Title: Unsupervised Learning of Style-Aware Facial Animation from Real Acting
Performances
- Authors: Wolfgang Paier and Anna Hilsmann and Peter Eisert
- Abstract summary: We present a novel approach for text/speech-driven animation of a photo-realistic head model based on blend-shape geometry, dynamic textures, and neural rendering.
Our animation method is based on a conditional CNN that transforms text or speech into a sequence of animation parameters.
For realistic real-time rendering, we train a U-Net that refines pixelization-based renderings by computing improved colors and a foreground matte.
- Score: 3.95944314850151
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper presents a novel approach for text/speech-driven animation of a
photo-realistic head model based on blend-shape geometry, dynamic textures, and
neural rendering. Training a VAE for geometry and texture yields a parametric
model for accurate capturing and realistic synthesis of facial expressions from
a latent feature vector. Our animation method is based on a conditional CNN
that transforms text or speech into a sequence of animation parameters. In
contrast to previous approaches, our animation model learns
disentangling/synthesizing different acting-styles in an unsupervised manner,
requiring only phonetic labels that describe the content of training sequences.
For realistic real-time rendering, we train a U-Net that refines
rasterization-based renderings by computing improved pixel colors and a
foreground matte. We compare our framework qualitatively/quantitatively against
recent methods for head modeling as well as facial animation and evaluate the
perceived rendering/animation quality in a user-study, which indicates large
improvements compared to state-of-the-art approaches
Related papers
- FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations [65.64014682930164]
Sketch animations offer a powerful medium for visual storytelling, from simple flip-book doodles to professional studio productions.
We present FlipSketch, a system that brings back the magic of flip-book animation -- just draw your idea and describe how you want it to move!
arXiv Detail & Related papers (2024-11-16T14:53:03Z) - Dynamic Typography: Bringing Text to Life via Video Diffusion Prior [73.72522617586593]
We present an automated text animation scheme, termed "Dynamic Typography"
It deforms letters to convey semantic meaning and infuses them with vibrant movements based on user prompts.
Our technique harnesses vector graphics representations and an end-to-end optimization-based framework.
arXiv Detail & Related papers (2024-04-17T17:59:55Z) - FLARE: Fast Learning of Animatable and Relightable Mesh Avatars [64.48254296523977]
Our goal is to efficiently learn personalized animatable 3D head avatars from videos that are geometrically accurate, realistic, relightable, and compatible with current rendering systems.
We introduce FLARE, a technique that enables the creation of animatable and relightable avatars from a single monocular video.
arXiv Detail & Related papers (2023-10-26T16:13:00Z) - TADA! Text to Animatable Digital Avatars [57.52707683788961]
TADA takes textual descriptions and produces expressive 3D avatars with high-quality geometry and lifelike textures.
We derive an optimizable high-resolution body model from SMPL-X with 3D displacements and a texture map.
We render normals and RGB images of the generated character and exploit their latent embeddings in the SDS training process.
arXiv Detail & Related papers (2023-08-21T17:59:10Z) - Style Transfer for 2D Talking Head Animation [11.740847190449314]
We present a new method to generate talking head animation with learnable style references.
Our framework can reconstruct 2D talking head animation based on a single input image and an audio stream.
arXiv Detail & Related papers (2023-03-17T07:02:59Z) - MeshTalk: 3D Face Animation from Speech using Cross-Modality
Disentanglement [142.9900055577252]
We propose a generic audio-driven facial animation approach that achieves highly realistic motion synthesis results for the entire face.
Our approach ensures highly accurate lip motion, while also plausible animation of the parts of the face that are uncorrelated to the audio signal, such as eye blinks and eye brow motion.
arXiv Detail & Related papers (2021-04-16T17:05:40Z) - Self-Supervised Equivariant Scene Synthesis from Video [84.15595573718925]
We propose a framework to learn scene representations from video that are automatically delineated into background, characters, and animations.
After training, we can manipulate image encodings in real time to create unseen combinations of the delineated components.
We demonstrate results on three datasets: Moving MNIST with backgrounds, 2D video game sprites, and Fashion Modeling.
arXiv Detail & Related papers (2021-02-01T14:17:31Z) - Neural Face Models for Example-Based Visual Speech Synthesis [2.2817442144155207]
We present a marker-less approach for facial motion capture based on multi-view video.
We learn a neural representation of facial expressions, which is used to seamlessly facial performances during the animation procedure.
arXiv Detail & Related papers (2020-09-22T07:35:33Z) - Going beyond Free Viewpoint: Creating Animatable Volumetric Video of
Human Performances [7.7824496657259665]
We present an end-to-end pipeline for the creation of high-quality animatable volumetric video content of human performances.
Semantic enrichment and geometric animation ability are achieved by establishing temporal consistency in the 3D data.
For pose editing, we exploit the captured data as much as possible and kinematically deform the captured frames to fit a desired pose.
arXiv Detail & Related papers (2020-09-02T09:46:12Z) - A Robust Interactive Facial Animation Editing System [0.0]
We propose a new learning-based approach to easily edit a facial animation from a set of intuitive control parameters.
We use a resolution-preserving fully convolutional neural network that maps control parameters to blendshapes coefficients sequences.
The proposed system is robust and can handle coarse, exaggerated edits from non-specialist users.
arXiv Detail & Related papers (2020-07-18T08:31:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.