Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis
- URL: http://arxiv.org/abs/2402.17364v1
- Date: Tue, 27 Feb 2024 09:56:15 GMT
- Title: Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis
- Authors: Zicheng Zhang, Ruobing Zheng, Ziwen Liu, Congying Han, Tianqi Li, Meng
Wang, Tiande Guo, Jingdong Chen, Bonan Li, Ming Yang
- Abstract summary: We introduce Dynamic Tetrahedra (DynTet), a novel hybrid representation that encodes explicit dynamic meshes by neural networks.
Compared with prior works, DynTet demonstrates significant improvements in fidelity, lip synchronization, and real-time performance according to various metrics.
- Score: 31.90503003079933
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent works in implicit representations, such as Neural Radiance Fields
(NeRF), have advanced the generation of realistic and animatable head avatars
from video sequences. These implicit methods are still confronted by visual
artifacts and jitters, since the lack of explicit geometric constraints poses a
fundamental challenge in accurately modeling complex facial deformations. In
this paper, we introduce Dynamic Tetrahedra (DynTet), a novel hybrid
representation that encodes explicit dynamic meshes by neural networks to
ensure geometric consistency across various motions and viewpoints. DynTet is
parameterized by the coordinate-based networks which learn signed distance,
deformation, and material texture, anchoring the training data into a
predefined tetrahedra grid. Leveraging Marching Tetrahedra, DynTet efficiently
decodes textured meshes with a consistent topology, enabling fast rendering
through a differentiable rasterizer and supervision via a pixel loss. To
enhance training efficiency, we incorporate classical 3D Morphable Models to
facilitate geometry learning and define a canonical space for simplifying
texture learning. These advantages are readily achievable owing to the
effective geometric representation employed in DynTet. Compared with prior
works, DynTet demonstrates significant improvements in fidelity, lip
synchronization, and real-time performance according to various metrics. Beyond
producing stable and visually appealing synthesis videos, our method also
outputs the dynamic meshes which is promising to enable many emerging
applications.
Related papers
- TFS-NeRF: Template-Free NeRF for Semantic 3D Reconstruction of Dynamic Scene [25.164085646259856]
This paper introduces a template-free 3D semantic NeRF for dynamic scenes captured from sparse or singleview RGB videos.
By disentangling the motions of interacting entities and optimizing per-entity skinning weights, our method efficiently generates accurate, semantically separable geometries.
arXiv Detail & Related papers (2024-09-26T01:34:42Z) - FaceFolds: Meshed Radiance Manifolds for Efficient Volumetric Rendering of Dynamic Faces [21.946327323788275]
3D rendering of dynamic face is a challenging problem.
We present a novel representation that enables high-quality rendering of an actor's dynamic facial performances.
arXiv Detail & Related papers (2024-04-22T00:44:13Z) - High-Quality Mesh Blendshape Generation from Face Videos via Neural Inverse Rendering [15.009484906668737]
We introduce a novel technique that reconstructs mesh-based blendshape rigs from single or sparse multi-view videos.
Experiments demonstrate that, with the flexible input of single or sparse multi-view videos, we reconstruct personalized high-fidelity blendshapes.
arXiv Detail & Related papers (2024-01-16T14:41:31Z) - DNS SLAM: Dense Neural Semantic-Informed SLAM [92.39687553022605]
DNS SLAM is a novel neural RGB-D semantic SLAM approach featuring a hybrid representation.
Our method integrates multi-view geometry constraints with image-based feature extraction to improve appearance details.
Our experimental results achieve state-of-the-art performance on both synthetic data and real-world data tracking.
arXiv Detail & Related papers (2023-11-30T21:34:44Z) - FLARE: Fast Learning of Animatable and Relightable Mesh Avatars [64.48254296523977]
Our goal is to efficiently learn personalized animatable 3D head avatars from videos that are geometrically accurate, realistic, relightable, and compatible with current rendering systems.
We introduce FLARE, a technique that enables the creation of animatable and relightable avatars from a single monocular video.
arXiv Detail & Related papers (2023-10-26T16:13:00Z) - HQ3DAvatar: High Quality Controllable 3D Head Avatar [65.70885416855782]
This paper presents a novel approach to building highly photorealistic digital head avatars.
Our method learns a canonical space via an implicit function parameterized by a neural network.
At test time, our method is driven by a monocular RGB video.
arXiv Detail & Related papers (2023-03-25T13:56:33Z) - Video-driven Neural Physically-based Facial Asset for Production [33.24654834163312]
We present a new learning-based, video-driven approach for generating dynamic facial geometries with high-quality physically-based assets.
Our technique provides higher accuracy and visual fidelity than previous video-driven facial reconstruction and animation methods.
arXiv Detail & Related papers (2022-02-11T13:22:48Z) - Extracting Triangular 3D Models, Materials, and Lighting From Images [59.33666140713829]
We present an efficient method for joint optimization of materials and lighting from multi-view image observations.
We leverage meshes with spatially-varying materials and environment that can be deployed in any traditional graphics engine.
arXiv Detail & Related papers (2021-11-24T13:58:20Z) - Scene Synthesis via Uncertainty-Driven Attribute Synchronization [52.31834816911887]
This paper introduces a novel neural scene synthesis approach that can capture diverse feature patterns of 3D scenes.
Our method combines the strength of both neural network-based and conventional scene synthesis approaches.
arXiv Detail & Related papers (2021-08-30T19:45:07Z) - Intrinsic Autoencoders for Joint Neural Rendering and Intrinsic Image
Decomposition [67.9464567157846]
We propose an autoencoder for joint generation of realistic images from synthetic 3D models while simultaneously decomposing real images into their intrinsic shape and appearance properties.
Our experiments confirm that a joint treatment of rendering and decomposition is indeed beneficial and that our approach outperforms state-of-the-art image-to-image translation baselines both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-06-29T12:53:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.