AvatarBooth: High-Quality and Customizable 3D Human Avatar Generation
- URL: http://arxiv.org/abs/2306.09864v1
- Date: Fri, 16 Jun 2023 14:18:51 GMT
- Title: AvatarBooth: High-Quality and Customizable 3D Human Avatar Generation
- Authors: Yifei Zeng, Yuanxun Lu, Xinya Ji, Yao Yao, Hao Zhu, Xun Cao
- Abstract summary: AvatarBooth is a novel method for generating high-quality 3D avatars using text prompts or specific images.
Our key contribution is the precise avatar generation control by using dual fine-tuned diffusion models.
We present a multi-resolution rendering strategy that facilitates coarse-to-fine supervision of 3D avatar generation.
- Score: 14.062402203105712
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce AvatarBooth, a novel method for generating high-quality 3D
avatars using text prompts or specific images. Unlike previous approaches that
can only synthesize avatars based on simple text descriptions, our method
enables the creation of personalized avatars from casually captured face or
body images, while still supporting text-based model generation and editing.
Our key contribution is the precise avatar generation control by using dual
fine-tuned diffusion models separately for the human face and body. This
enables us to capture intricate details of facial appearance, clothing, and
accessories, resulting in highly realistic avatar generations. Furthermore, we
introduce pose-consistent constraint to the optimization process to enhance the
multi-view consistency of synthesized head images from the diffusion model and
thus eliminate interference from uncontrolled human poses. In addition, we
present a multi-resolution rendering strategy that facilitates coarse-to-fine
supervision of 3D avatar generation, thereby enhancing the performance of the
proposed system. The resulting avatar model can be further edited using
additional text descriptions and driven by motion sequences. Experiments show
that AvatarBooth outperforms previous text-to-3D methods in terms of rendering
and geometric quality from either text prompts or specific images. Please check
our project website at https://zeng-yifei.github.io/avatarbooth_page/.
Related papers
- DivAvatar: Diverse 3D Avatar Generation with a Single Prompt [95.9978722953278]
DivAvatar is a framework that generates diverse avatars from a single text prompt.
It has two key designs that help achieve generation diversity and visual quality.
Extensive experiments show that DivAvatar is highly versatile in generating avatars of diverse appearances.
arXiv Detail & Related papers (2024-02-27T08:10:31Z) - One2Avatar: Generative Implicit Head Avatar For Few-shot User Adaptation [31.310769289315648]
This paper introduces a novel approach to create high quality head avatar utilizing only a single or a few images per user.
We learn a generative model for 3D animatable photo-realistic head avatar from a multi-view dataset of expressions from 2407 subjects.
Our method demonstrates compelling results and outperforms existing state-of-the-art methods for few-shot avatar adaptation.
arXiv Detail & Related papers (2024-02-19T07:48:29Z) - Disentangled Clothed Avatar Generation from Text Descriptions [41.01453534915251]
We introduce a novel text-to-avatar generation method that separately generates the human body and the clothes.
Our approach achieves higher texture and geometry quality and better semantic alignment with text prompts.
arXiv Detail & Related papers (2023-12-08T18:43:12Z) - AvatarStudio: High-fidelity and Animatable 3D Avatar Creation from Text [71.09533176800707]
AvatarStudio is a coarse-to-fine generative model that generates explicit textured 3D meshes for animatable human avatars.
By effectively leveraging the synergy between the articulated mesh representation and the DensePose-conditional diffusion model, AvatarStudio can create high-quality avatars.
arXiv Detail & Related papers (2023-11-29T18:59:32Z) - AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars
Using 2D Diffusion [34.609403685504944]
We present AvatarFusion, a framework for zero-shot text-to-avatar generation.
We use a latent diffusion model to provide pixel-level guidance for generating human-realistic avatars.
We also introduce a novel optimization method, called Pixel-Semantics Difference-Sampling (PS-DS), which semantically separates the generation of body and clothes.
arXiv Detail & Related papers (2023-07-13T02:19:56Z) - HeadSculpt: Crafting 3D Head Avatars with Text [143.14548696613886]
We introduce a versatile pipeline dubbed HeadSculpt for crafting 3D head avatars from textual prompts.
We first equip the diffusion model with 3D awareness by leveraging landmark-based control and a learned textual embedding.
We propose a novel identity-aware editing score distillation strategy to optimize a textured mesh with a high-resolution differentiable rendering technique.
arXiv Detail & Related papers (2023-06-05T16:53:58Z) - DreamWaltz: Make a Scene with Complex 3D Animatable Avatars [68.49935994384047]
We present DreamWaltz, a novel framework for generating and animating complex 3D avatars given text guidance and parametric human body prior.
For animation, our method learns an animatable 3D avatar representation from abundant image priors of diffusion model conditioned on various poses.
arXiv Detail & Related papers (2023-05-21T17:59:39Z) - Text-Conditional Contextualized Avatars For Zero-Shot Personalization [47.85747039373798]
We propose a pipeline that enables personalization of image generation with avatars capturing a user's identity in a delightful way.
Our pipeline is zero-shot, avatar texture and style agnostic, and does not require training on the avatar at all.
We show, for the first time, how to leverage large-scale image datasets to learn human 3D pose parameters.
arXiv Detail & Related papers (2023-04-14T22:00:44Z) - DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via
Diffusion Models [55.71306021041785]
We present DreamAvatar, a text-and-shape guided framework for generating high-quality 3D human avatars.
We leverage the SMPL model to provide shape and pose guidance for the generation.
We also jointly optimize the losses computed from the full body and from the zoomed-in 3D head to alleviate the common multi-face ''Janus'' problem.
arXiv Detail & Related papers (2023-04-03T12:11:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.