RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models
- URL: http://arxiv.org/abs/2407.06938v2
- Date: Thu, 11 Jul 2024 03:46:45 GMT
- Title: RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models
- Authors: Bowen Zhang, Yiji Cheng, Chunyu Wang, Ting Zhang, Jiaolong Yang, Yansong Tang, Feng Zhao, Dong Chen, Baining Guo,
- Abstract summary: We present RodinHD, which can generate high-fidelity 3D avatars from a portrait image.
We first identify an overlooked problem of catastrophic forgetting that arises when fitting triplanes sequentially on many avatars.
We optimize the guiding effect of the portrait image by computing a finer-grained hierarchical representation that captures rich 2D texture cues, and injecting them to the 3D diffusion model at multiple layers via cross-attention.
When trained on 46K avatars with a noise schedule optimized for triplanes, the resulting model can generate 3D avatars with notably better details than previous methods and can generalize to in-the-wild
- Score: 56.13752698926105
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present RodinHD, which can generate high-fidelity 3D avatars from a portrait image. Existing methods fail to capture intricate details such as hairstyles which we tackle in this paper. We first identify an overlooked problem of catastrophic forgetting that arises when fitting triplanes sequentially on many avatars, caused by the MLP decoder sharing scheme. To overcome this issue, we raise a novel data scheduling strategy and a weight consolidation regularization term, which improves the decoder's capability of rendering sharper details. Additionally, we optimize the guiding effect of the portrait image by computing a finer-grained hierarchical representation that captures rich 2D texture cues, and injecting them to the 3D diffusion model at multiple layers via cross-attention. When trained on 46K avatars with a noise schedule optimized for triplanes, the resulting model can generate 3D avatars with notably better details than previous methods and can generalize to in-the-wild portrait input.
Related papers
- DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D
Diffusion [69.67970568012599]
We present DreamWaltz-G, a novel learning framework for animatable 3D avatar generation from text.
The core of this framework lies in Score Distillation and Hybrid 3D Gaussian Avatar representation.
Our framework further supports diverse applications, including human video reenactment and multi-subject scene composition.
arXiv Detail & Related papers (2024-09-25T17:59:45Z) - CHASE: 3D-Consistent Human Avatars with Sparse Inputs via Gaussian Splatting and Contrastive Learning [19.763523500564542]
We propose CHASE, which introduces supervision from intrinsic 3D consistency across poses and 3D geometry contrastive learning.
CHASE achieves performance comparable with sparse inputs to that with full inputs.
Though CHASE is designed for sparse inputs, it surprisingly outperforms current SOTA methods.
arXiv Detail & Related papers (2024-08-19T02:46:23Z) - One2Avatar: Generative Implicit Head Avatar For Few-shot User Adaptation [31.310769289315648]
This paper introduces a novel approach to create high quality head avatar utilizing only a single or a few images per user.
We learn a generative model for 3D animatable photo-realistic head avatar from a multi-view dataset of expressions from 2407 subjects.
Our method demonstrates compelling results and outperforms existing state-of-the-art methods for few-shot avatar adaptation.
arXiv Detail & Related papers (2024-02-19T07:48:29Z) - DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via
Diffusion Models [55.71306021041785]
We present DreamAvatar, a text-and-shape guided framework for generating high-quality 3D human avatars.
We leverage the SMPL model to provide shape and pose guidance for the generation.
We also jointly optimize the losses computed from the full body and from the zoomed-in 3D head to alleviate the common multi-face ''Janus'' problem.
arXiv Detail & Related papers (2023-04-03T12:11:51Z) - OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane
Rendering [81.55960827071661]
Controllability, generalizability and efficiency are the major objectives of constructing face avatars represented by neural implicit field.
We propose One-shot Talking face Avatar (OTAvatar), which constructs face avatars by a generalized controllable tri-plane rendering solution.
arXiv Detail & Related papers (2023-03-26T09:12:03Z) - Rodin: A Generative Model for Sculpting 3D Digital Avatars Using
Diffusion [66.26780039133122]
This paper presents a 3D generative model that uses diffusion models to automatically generate 3D digital avatars.
The memory and processing costs in 3D are prohibitive for producing the rich details required for high-quality avatars.
We can generate highly detailed avatars with realistic hairstyles and facial hair like beards.
arXiv Detail & Related papers (2022-12-12T18:59:40Z) - DRaCoN -- Differentiable Rasterization Conditioned Neural Radiance
Fields for Articulated Avatars [92.37436369781692]
We present DRaCoN, a framework for learning full-body volumetric avatars.
It exploits the advantages of both the 2D and 3D neural rendering techniques.
Experiments on the challenging ZJU-MoCap and Human3.6M datasets indicate that DRaCoN outperforms state-of-the-art methods.
arXiv Detail & Related papers (2022-03-29T17:59:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.