Drivable Volumetric Avatars using Texel-Aligned Features
- URL: http://arxiv.org/abs/2207.09774v1
- Date: Wed, 20 Jul 2022 09:28:16 GMT
- Title: Drivable Volumetric Avatars using Texel-Aligned Features
- Authors: Edoardo Remelli, Timur Bagautdinov, Shunsuke Saito, Tomas Simon,
Chenglei Wu, Shih-En Wei, Kaiwen Guo, Zhe Cao, Fabian Prada, Jason Saragih,
Yaser Sheikh
- Abstract summary: Photo telepresence requires both high-fidelity body modeling and faithful driving to enable dynamically synthesized appearance.
We propose an end-to-end framework that addresses two core challenges in modeling and driving full-body avatars of real people.
- Score: 52.89305658071045
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Photorealistic telepresence requires both high-fidelity body modeling and
faithful driving to enable dynamically synthesized appearance that is
indistinguishable from reality. In this work, we propose an end-to-end
framework that addresses two core challenges in modeling and driving full-body
avatars of real people. One challenge is driving an avatar while staying
faithful to details and dynamics that cannot be captured by a global
low-dimensional parameterization such as body pose. Our approach supports
driving of clothed avatars with wrinkles and motion that a real driving
performer exhibits beyond the training corpus. Unlike existing global state
representations or non-parametric screen-space approaches, we introduce
texel-aligned features -- a localised representation which can leverage both
the structural prior of a skeleton-based parametric model and observed sparse
image signals at the same time. Another challenge is modeling a temporally
coherent clothed avatar, which typically requires precise surface tracking. To
circumvent this, we propose a novel volumetric avatar representation by
extending mixtures of volumetric primitives to articulated objects. By
explicitly incorporating articulation, our approach naturally generalizes to
unseen poses. We also introduce a localized viewpoint conditioning, which leads
to a large improvement in generalization of view-dependent appearance. The
proposed volumetric representation does not require high-quality mesh tracking
as a prerequisite and brings significant quality improvements compared to
mesh-based counterparts. In our experiments, we carefully examine our design
choices and demonstrate the efficacy of our approach, outperforming the
state-of-the-art methods on challenging driving scenarios.
Related papers
- GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations [54.94362657501809]
We propose a new method to generate highly dynamic and deformable human head avatars from multi-view imagery in real-time.
At the core of our method is a hierarchical representation of head models that allows to capture the complex dynamics of facial expressions and head movements.
We train this coarse-to-fine facial avatar model along with the head pose as a learnable parameter in an end-to-end framework.
arXiv Detail & Related papers (2024-09-18T13:05:43Z) - AniDress: Animatable Loose-Dressed Avatar from Sparse Views Using
Garment Rigging Model [58.035758145894846]
We introduce AniDress, a novel method for generating animatable human avatars in loose clothes using very sparse multi-view videos.
A pose-driven deformable neural radiance field conditioned on both body and garment motions is introduced, providing explicit control of both parts.
Our method is able to render natural garment dynamics that deviate highly from the body and well to generalize to both unseen views and poses.
arXiv Detail & Related papers (2024-01-27T08:48:18Z) - From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations [107.88375243135579]
Given speech audio, we output multiple possibilities of gestural motion for an individual, including face, body, and hands.
We visualize the generated motion using highly photorealistic avatars that can express crucial nuances in gestures.
Experiments show our model generates appropriate and diverse gestures, outperforming both diffusion- and VQ-only methods.
arXiv Detail & Related papers (2024-01-03T18:55:16Z) - GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians [51.46168990249278]
We present an efficient approach to creating realistic human avatars with dynamic 3D appearances from a single video.
GustafAvatar is validated on both the public dataset and our collected dataset.
arXiv Detail & Related papers (2023-12-04T18:55:45Z) - HAvatar: High-fidelity Head Avatar via Facial Model Conditioned Neural
Radiance Field [44.848368616444446]
We introduce a novel hybrid explicit-implicit 3D representation, Facial Model Conditioned Neural Radiance Field, which integrates the expressiveness of NeRF and the prior information from the parametric template.
By adopting an overall GAN-based architecture using an image-to-image translation network, we achieve high-resolution, realistic and view-consistent synthesis of dynamic head appearance.
arXiv Detail & Related papers (2023-09-29T10:45:22Z) - MonoHuman: Animatable Human Neural Field from Monocular Video [30.113937856494726]
We propose a novel framework MonoHuman, which robustly renders view-consistent and high-fidelity avatars under arbitrary novel poses.
Our key insight is to model the deformation field with bi-directional constraints and explicitly leverage the off-the-peg information to reason the feature for coherent results.
arXiv Detail & Related papers (2023-04-04T17:55:03Z) - AutoAvatar: Autoregressive Neural Fields for Dynamic Avatar Modeling [38.9663410820652]
We exploit autoregressive modeling to capture dynamic effects, such as soft-tissue deformations.
We introduce the notion of articulated observer points, which relate implicit states to the explicit surface of a parametric human body model.
Our approach outperforms the state of the art, achieving plausible dynamic deformations even for unseen motions.
arXiv Detail & Related papers (2022-03-25T17:59:59Z) - Imposing Temporal Consistency on Deep Monocular Body Shape and Pose
Estimation [67.23327074124855]
This paper presents an elegant solution for the integration of temporal constraints in the fitting process.
We derive parameters of a sequence of body models, representing shape and motion of a person, including jaw poses, facial expressions, and finger poses.
Our approach enables the derivation of realistic 3D body models from image sequences, including facial expression and articulated hands.
arXiv Detail & Related papers (2022-02-07T11:11:55Z) - Dynamic Neural Garments [45.833166320896716]
We present a solution that takes in body joint motion to directly produce realistic dynamic garment image sequences.
Specifically, given the target joint motion sequence of an avatar, we propose dynamic neural garments to jointly simulate and render plausible dynamic garment appearance.
arXiv Detail & Related papers (2021-02-23T17:21:21Z) - PVA: Pixel-aligned Volumetric Avatars [34.929560973779466]
We devise a novel approach for predicting volumetric avatars of the human head given just a small number of inputs.
Our approach is trained in an end-to-end manner solely based on a photometric re-rendering loss without requiring explicit 3D supervision.
arXiv Detail & Related papers (2021-01-07T18:58:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.