Neural Face Models for Example-Based Visual Speech Synthesis
- URL: http://arxiv.org/abs/2009.10361v1
- Date: Tue, 22 Sep 2020 07:35:33 GMT
- Title: Neural Face Models for Example-Based Visual Speech Synthesis
- Authors: Wolfgang Paier and Anna Hilsmann and Peter Eisert
- Abstract summary: We present a marker-less approach for facial motion capture based on multi-view video.
We learn a neural representation of facial expressions, which is used to seamlessly facial performances during the animation procedure.
- Score: 2.2817442144155207
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Creating realistic animations of human faces with computer graphic models is
still a challenging task. It is often solved either with tedious manual work or
motion capture based techniques that require specialised and costly hardware.
Example based animation approaches circumvent these problems by re-using
captured data of real people. This data is split into short motion samples that
can be looped or concatenated in order to create novel motion sequences. The
obvious advantages of this approach are the simplicity of use and the high
realism, since the data exhibits only real deformations. Rather than tuning
weights of a complex face rig, the animation task is performed on a higher
level by arranging typical motion samples in a way such that the desired facial
performance is achieved. Two difficulties with example based approaches,
however, are high memory requirements as well as the creation of artefact-free
and realistic transitions between motion samples. We solve these problems by
combining the realism and simplicity of example-based animations with the
advantages of neural face models. Our neural face model is capable of
synthesising high quality 3D face geometry and texture according to a compact
latent parameter vector. This latent representation reduces memory requirements
by a factor of 100 and helps creating seamless transitions between concatenated
motion samples. In this paper, we present a marker-less approach for facial
motion capture based on multi-view video. Based on the captured data, we learn
a neural representation of facial expressions, which is used to seamlessly
concatenate facial performances during the animation procedure. We demonstrate
the effectiveness of our approach by synthesising mouthings for Swiss-German
sign language based on viseme query sequences.
Related papers
- GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations [54.94362657501809]
We propose a new method to generate highly dynamic and deformable human head avatars from multi-view imagery in real-time.
At the core of our method is a hierarchical representation of head models that allows to capture the complex dynamics of facial expressions and head movements.
We train this coarse-to-fine facial avatar model along with the head pose as a learnable parameter in an end-to-end framework.
arXiv Detail & Related papers (2024-09-18T13:05:43Z) - Unsupervised Learning of Style-Aware Facial Animation from Real Acting
Performances [3.95944314850151]
We present a novel approach for text/speech-driven animation of a photo-realistic head model based on blend-shape geometry, dynamic textures, and neural rendering.
Our animation method is based on a conditional CNN that transforms text or speech into a sequence of animation parameters.
For realistic real-time rendering, we train a U-Net that refines pixelization-based renderings by computing improved colors and a foreground matte.
arXiv Detail & Related papers (2023-06-16T17:58:04Z) - Hybrid Neural Rendering for Large-Scale Scenes with Motion Blur [68.24599239479326]
We develop a hybrid neural rendering model that makes image-based representation and neural 3D representation join forces to render high-quality, view-consistent images.
Our model surpasses state-of-the-art point-based methods for novel view synthesis.
arXiv Detail & Related papers (2023-04-25T08:36:33Z) - Pose-Controllable 3D Facial Animation Synthesis using Hierarchical
Audio-Vertex Attention [52.63080543011595]
A novel pose-controllable 3D facial animation synthesis method is proposed by utilizing hierarchical audio-vertex attention.
The proposed method can produce more realistic facial expressions and head posture movements.
arXiv Detail & Related papers (2023-02-24T09:36:31Z) - CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior [27.989344587876964]
Speech-driven 3D facial animation has been widely studied, yet there is still a gap to achieving realism and vividness.
We propose to cast speech-driven facial animation as a code query task in a finite proxy space of the learned codebook.
We demonstrate that our approach outperforms current state-of-the-art methods both qualitatively and quantitatively.
arXiv Detail & Related papers (2023-01-06T05:04:32Z) - Render In-between: Motion Guided Video Synthesis for Action
Interpolation [53.43607872972194]
We propose a motion-guided frame-upsampling framework that is capable of producing realistic human motion and appearance.
A novel motion model is trained to inference the non-linear skeletal motion between frames by leveraging a large-scale motion-capture dataset.
Our pipeline only requires low-frame-rate videos and unpaired human motion data but does not require high-frame-rate videos for training.
arXiv Detail & Related papers (2021-11-01T15:32:51Z) - PIRenderer: Controllable Portrait Image Generation via Semantic Neural
Rendering [56.762094966235566]
A Portrait Image Neural Renderer is proposed to control the face motions with the parameters of three-dimensional morphable face models.
The proposed model can generate photo-realistic portrait images with accurate movements according to intuitive modifications.
Our model can generate coherent videos with convincing movements from only a single reference image and a driving audio stream.
arXiv Detail & Related papers (2021-09-17T07:24:16Z) - MeshTalk: 3D Face Animation from Speech using Cross-Modality
Disentanglement [142.9900055577252]
We propose a generic audio-driven facial animation approach that achieves highly realistic motion synthesis results for the entire face.
Our approach ensures highly accurate lip motion, while also plausible animation of the parts of the face that are uncorrelated to the audio signal, such as eye blinks and eye brow motion.
arXiv Detail & Related papers (2021-04-16T17:05:40Z) - Going beyond Free Viewpoint: Creating Animatable Volumetric Video of
Human Performances [7.7824496657259665]
We present an end-to-end pipeline for the creation of high-quality animatable volumetric video content of human performances.
Semantic enrichment and geometric animation ability are achieved by establishing temporal consistency in the 3D data.
For pose editing, we exploit the captured data as much as possible and kinematically deform the captured frames to fit a desired pose.
arXiv Detail & Related papers (2020-09-02T09:46:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.