FRESA: Feedforward Reconstruction of Personalized Skinned Avatars from Few Images
- URL: http://arxiv.org/abs/2503.19207v2
- Date: Fri, 04 Apr 2025 08:17:08 GMT
- Title: FRESA: Feedforward Reconstruction of Personalized Skinned Avatars from Few Images
- Authors: Rong Wang, Fabian Prada, Ziyan Wang, Zhongshi Jiang, Chengxiang Yin, Junxuan Li, Shunsuke Saito, Igor Santesteban, Javier Romero, Rohan Joshi, Hongdong Li, Jason Saragih, Yaser Sheikh,
- Abstract summary: We present a novel method for reconstructing personalized 3D human avatars with realistic animation from only a few images.<n>We learn a universal prior from over a thousand clothed humans to achieve instant feedforward generation and zero-shot generalization.<n>Our method generates more authentic reconstruction and animation than state-of-the-arts, and can be directly generalized to inputs from casually taken phone photos.
- Score: 74.86864398919467
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a novel method for reconstructing personalized 3D human avatars with realistic animation from only a few images. Due to the large variations in body shapes, poses, and cloth types, existing methods mostly require hours of per-subject optimization during inference, which limits their practical applications. In contrast, we learn a universal prior from over a thousand clothed humans to achieve instant feedforward generation and zero-shot generalization. Specifically, instead of rigging the avatar with shared skinning weights, we jointly infer personalized avatar shape, skinning weights, and pose-dependent deformations, which effectively improves overall geometric fidelity and reduces deformation artifacts. Moreover, to normalize pose variations and resolve coupled ambiguity between canonical shapes and skinning weights, we design a 3D canonicalization process to produce pixel-aligned initial conditions, which helps to reconstruct fine-grained geometric details. We then propose a multi-frame feature aggregation to robustly reduce artifacts introduced in canonicalization and fuse a plausible avatar preserving person-specific identities. Finally, we train the model in an end-to-end framework on a large-scale capture dataset, which contains diverse human subjects paired with high-quality 3D scans. Extensive experiments show that our method generates more authentic reconstruction and animation than state-of-the-arts, and can be directly generalized to inputs from casually taken phone photos. Project page and code is available at https://github.com/rongakowang/FRESA.
Related papers
- AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction [26.82525451095629]
We propose a robust method for 3D reconstruction of inconsistent images, enabling real-time rendering during inference.<n>We recast the reconstruction problem as a 4D task and introduce an efficient 3D modeling approach using 4D Gaussian Splatting.<n>Experiments demonstrate that our method achieves photorealistic, real-time animation of 3D human avatars from in-the-wild images.
arXiv Detail & Related papers (2024-12-03T18:55:39Z) - Generalizable One-shot Neural Head Avatar [90.50492165284724]
We present a method that reconstructs and animates a 3D head avatar from a single-view portrait image.
We propose a framework that not only generalizes to unseen identities based on a single-view image, but also captures characteristic details within and beyond the face area.
arXiv Detail & Related papers (2023-06-14T22:33:09Z) - AniPixel: Towards Animatable Pixel-Aligned Human Avatar [65.7175527782209]
AniPixel is a novel animatable and generalizable human avatar reconstruction method.
We propose a neural skinning field based on skeleton-driven deformation to establish the target-to-canonical and canonical-to-observation correspondences.
Experiments show that AniPixel renders comparable novel views while delivering better novel pose animation results than state-of-the-art methods.
arXiv Detail & Related papers (2023-02-07T11:04:14Z) - One-shot Implicit Animatable Avatars with Model-based Priors [31.385051428938585]
ELICIT is a novel method for learning human-specific neural radiance fields from a single image.
ELICIT has outperformed strong baseline methods of avatar creation when only a single image is available.
arXiv Detail & Related papers (2022-12-05T18:24:06Z) - AvatarGen: A 3D Generative Model for Animatable Human Avatars [108.11137221845352]
AvatarGen is an unsupervised generation of 3D-aware clothed humans with various appearances and controllable geometries.
Our method can generate animatable 3D human avatars with high-quality appearance and geometry modeling.
It is competent for many applications, e.g., single-view reconstruction, re-animation, and text-guided synthesis/editing.
arXiv Detail & Related papers (2022-11-26T15:15:45Z) - AvatarGen: a 3D Generative Model for Animatable Human Avatars [108.11137221845352]
AvatarGen is the first method that enables not only non-rigid human generation with diverse appearance but also full control over poses and viewpoints.
To model non-rigid dynamics, it introduces a deformation network to learn pose-dependent deformations in the canonical space.
Our method can generate animatable human avatars with high-quality appearance and geometry modeling, significantly outperforming previous 3D GANs.
arXiv Detail & Related papers (2022-08-01T01:27:02Z) - MVP-Human Dataset for 3D Human Avatar Reconstruction from Unconstrained
Frames [59.37430649840777]
We present 3D Avatar Reconstruction in the wild (ARwild), which first reconstructs the implicit skinning fields in a multi-level manner.
We contribute a large-scale dataset, MVP-Human, which contains 400 subjects, each of which has 15 scans in different poses.
Overall, benefits from the specific network architecture and the diverse data, the trained model enables 3D avatar reconstruction from unconstrained frames.
arXiv Detail & Related papers (2022-04-24T03:57:59Z) - Multi-person Implicit Reconstruction from a Single Image [37.6877421030774]
We present a new end-to-end learning framework to obtain detailed and spatially coherent reconstructions of multiple people from a single image.
Existing multi-person methods suffer from two main drawbacks: they are often model-based and cannot capture accurate 3D models of people with loose clothing and hair.
arXiv Detail & Related papers (2021-04-19T13:21:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.