Neural Face Models for Example-Based Visual Speech Synthesis
- URL: http://arxiv.org/abs/2009.10361v1
- Date: Tue, 22 Sep 2020 07:35:33 GMT
- Title: Neural Face Models for Example-Based Visual Speech Synthesis
- Authors: Wolfgang Paier and Anna Hilsmann and Peter Eisert
- Abstract summary: We present a marker-less approach for facial motion capture based on multi-view video.
We learn a neural representation of facial expressions, which is used to seamlessly facial performances during the animation procedure.
- Score: 2.2817442144155207
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Creating realistic animations of human faces with computer graphic models is
still a challenging task. It is often solved either with tedious manual work or
motion capture based techniques that require specialised and costly hardware.
Example based animation approaches circumvent these problems by re-using
captured data of real people. This data is split into short motion samples that
can be looped or concatenated in order to create novel motion sequences. The
obvious advantages of this approach are the simplicity of use and the high
realism, since the data exhibits only real deformations. Rather than tuning
weights of a complex face rig, the animation task is performed on a higher
level by arranging typical motion samples in a way such that the desired facial
performance is achieved. Two difficulties with example based approaches,
however, are high memory requirements as well as the creation of artefact-free
and realistic transitions between motion samples. We solve these problems by
combining the realism and simplicity of example-based animations with the
advantages of neural face models. Our neural face model is capable of
synthesising high quality 3D face geometry and texture according to a compact
latent parameter vector. This latent representation reduces memory requirements
by a factor of 100 and helps creating seamless transitions between concatenated
motion samples. In this paper, we present a marker-less approach for facial
motion capture based on multi-view video. Based on the captured data, we learn
a neural representation of facial expressions, which is used to seamlessly
concatenate facial performances during the animation procedure. We demonstrate
the effectiveness of our approach by synthesising mouthings for Swiss-German
sign language based on viseme query sequences.
Related papers
- Learning Semantic Facial Descriptors for Accurate Face Animation [43.370084532812044]
We introduce the semantic facial descriptors in learnable disentangled vector space to address the dilemma.
We obtain basis vector coefficients by employing an encoder on the source and driving faces, leading to effective facial descriptors in the identity and motion subspaces.
Our approach successfully addresses the issue of model-based methods' limitations in high-fidelity identity and the challenges faced by model-free methods in accurate motion transfer.
arXiv Detail & Related papers (2025-01-29T15:40:42Z) - SimVS: Simulating World Inconsistencies for Robust View Synthesis [102.83898965828621]
We present an approach for leveraging generative video models to simulate the inconsistencies in the world that can occur during capture.
We demonstrate that our world-simulation strategy significantly outperforms traditional augmentation methods in handling real-world scene variations.
arXiv Detail & Related papers (2024-12-10T17:35:12Z) - Unsupervised Learning of Style-Aware Facial Animation from Real Acting
Performances [3.95944314850151]
We present a novel approach for text/speech-driven animation of a photo-realistic head model based on blend-shape geometry, dynamic textures, and neural rendering.
Our animation method is based on a conditional CNN that transforms text or speech into a sequence of animation parameters.
For realistic real-time rendering, we train a U-Net that refines pixelization-based renderings by computing improved colors and a foreground matte.
arXiv Detail & Related papers (2023-06-16T17:58:04Z) - Hybrid Neural Rendering for Large-Scale Scenes with Motion Blur [68.24599239479326]
We develop a hybrid neural rendering model that makes image-based representation and neural 3D representation join forces to render high-quality, view-consistent images.
Our model surpasses state-of-the-art point-based methods for novel view synthesis.
arXiv Detail & Related papers (2023-04-25T08:36:33Z) - Pose-Controllable 3D Facial Animation Synthesis using Hierarchical
Audio-Vertex Attention [52.63080543011595]
A novel pose-controllable 3D facial animation synthesis method is proposed by utilizing hierarchical audio-vertex attention.
The proposed method can produce more realistic facial expressions and head posture movements.
arXiv Detail & Related papers (2023-02-24T09:36:31Z) - CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior [27.989344587876964]
Speech-driven 3D facial animation has been widely studied, yet there is still a gap to achieving realism and vividness.
We propose to cast speech-driven facial animation as a code query task in a finite proxy space of the learned codebook.
We demonstrate that our approach outperforms current state-of-the-art methods both qualitatively and quantitatively.
arXiv Detail & Related papers (2023-01-06T05:04:32Z) - Render In-between: Motion Guided Video Synthesis for Action
Interpolation [53.43607872972194]
We propose a motion-guided frame-upsampling framework that is capable of producing realistic human motion and appearance.
A novel motion model is trained to inference the non-linear skeletal motion between frames by leveraging a large-scale motion-capture dataset.
Our pipeline only requires low-frame-rate videos and unpaired human motion data but does not require high-frame-rate videos for training.
arXiv Detail & Related papers (2021-11-01T15:32:51Z) - PIRenderer: Controllable Portrait Image Generation via Semantic Neural
Rendering [56.762094966235566]
A Portrait Image Neural Renderer is proposed to control the face motions with the parameters of three-dimensional morphable face models.
The proposed model can generate photo-realistic portrait images with accurate movements according to intuitive modifications.
Our model can generate coherent videos with convincing movements from only a single reference image and a driving audio stream.
arXiv Detail & Related papers (2021-09-17T07:24:16Z) - MeshTalk: 3D Face Animation from Speech using Cross-Modality
Disentanglement [142.9900055577252]
We propose a generic audio-driven facial animation approach that achieves highly realistic motion synthesis results for the entire face.
Our approach ensures highly accurate lip motion, while also plausible animation of the parts of the face that are uncorrelated to the audio signal, such as eye blinks and eye brow motion.
arXiv Detail & Related papers (2021-04-16T17:05:40Z) - Going beyond Free Viewpoint: Creating Animatable Volumetric Video of
Human Performances [7.7824496657259665]
We present an end-to-end pipeline for the creation of high-quality animatable volumetric video content of human performances.
Semantic enrichment and geometric animation ability are achieved by establishing temporal consistency in the 3D data.
For pose editing, we exploit the captured data as much as possible and kinematically deform the captured frames to fit a desired pose.
arXiv Detail & Related papers (2020-09-02T09:46:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.