Disentangled Clothed Avatar Generation from Text Descriptions
- URL: http://arxiv.org/abs/2312.05295v1
- Date: Fri, 8 Dec 2023 18:43:12 GMT
- Title: Disentangled Clothed Avatar Generation from Text Descriptions
- Authors: Jionghao Wang, Yuan Liu, Zhiyang Dou, Zhengming Yu, Yongqing Liang,
Xin Li, Wenping Wang, Rong Xie, Li Song
- Abstract summary: We introduce a novel text-to-avatar generation method that separately generates the human body and the clothes.
Our approach achieves higher exture and geometry quality and better semantic alignment with text prompts.
- Score: 39.5476255730693
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper, we introduced a novel text-to-avatar generation method that
separately generates the human body and the clothes and allows high-quality
animation on the generated avatar. While recent advancements in text-to-avatar
generation have yielded diverse human avatars from text prompts, these methods
typically combine all elements-clothes, hair, and body-into a single 3D
representation. Such an entangled approach poses challenges for downstream
tasks like editing or animation. To overcome these limitations, we propose a
novel disentangled 3D avatar representation named Sequentially Offset-SMPL
(SO-SMPL), building upon the SMPL model. SO-SMPL represents the human body and
clothes with two separate meshes, but associates them with offsets to ensure
the physical alignment between the body and the clothes. Then, we design an
Score Distillation Sampling(SDS)-based distillation framework to generate the
proposed SO-SMPL representation from text prompts. In comparison with existing
text-to-avatar methods, our approach not only achieves higher exture and
geometry quality and better semantic alignment with text prompts, but also
significantly improves the visual quality of character animation, virtual
try-on, and avatar editing. Our project page is at
https://shanemankiw.github.io/SO-SMPL/.
Related papers
- DivAvatar: Diverse 3D Avatar Generation with a Single Prompt [95.9978722953278]
DivAvatar is a framework that generates diverse avatars from a single text prompt.
It has two key designs that help achieve generation diversity and visual quality.
Extensive experiments show that DivAvatar is highly versatile in generating avatars of diverse appearances.
arXiv Detail & Related papers (2024-02-27T08:10:31Z) - Text-Guided Generation and Editing of Compositional 3D Avatars [59.584042376006316]
Our goal is to create a realistic 3D facial avatar with hair and accessories using only a text description.
Existing methods either lack realism, produce unrealistic shapes, or do not support editing.
arXiv Detail & Related papers (2023-09-13T17:59:56Z) - TADA! Text to Animatable Digital Avatars [57.52707683788961]
TADA takes textual descriptions and produces expressive 3D avatars with high-quality geometry and lifelike textures.
We derive an optimizable high-resolution body model from SMPL-X with 3D displacements and a texture map.
We render normals and RGB images of the generated character and exploit their latent embeddings in the SDS training process.
arXiv Detail & Related papers (2023-08-21T17:59:10Z) - AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars
Using 2D Diffusion [34.609403685504944]
We present AvatarFusion, a framework for zero-shot text-to-avatar generation.
We use a latent diffusion model to provide pixel-level guidance for generating human-realistic avatars.
We also introduce a novel optimization method, called Pixel-Semantics Difference-Sampling (PS-DS), which semantically separates the generation of body and clothes.
arXiv Detail & Related papers (2023-07-13T02:19:56Z) - AvatarBooth: High-Quality and Customizable 3D Human Avatar Generation [14.062402203105712]
AvatarBooth is a novel method for generating high-quality 3D avatars using text prompts or specific images.
Our key contribution is the precise avatar generation control by using dual fine-tuned diffusion models.
We present a multi-resolution rendering strategy that facilitates coarse-to-fine supervision of 3D avatar generation.
arXiv Detail & Related papers (2023-06-16T14:18:51Z) - Text-Conditional Contextualized Avatars For Zero-Shot Personalization [47.85747039373798]
We propose a pipeline that enables personalization of image generation with avatars capturing a user's identity in a delightful way.
Our pipeline is zero-shot, avatar texture and style agnostic, and does not require training on the avatar at all.
We show, for the first time, how to leverage large-scale image datasets to learn human 3D pose parameters.
arXiv Detail & Related papers (2023-04-14T22:00:44Z) - DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via
Diffusion Models [55.71306021041785]
We present DreamAvatar, a text-and-shape guided framework for generating high-quality 3D human avatars.
We leverage the SMPL model to provide shape and pose guidance for the generation.
We also jointly optimize the losses computed from the full body and from the zoomed-in 3D head to alleviate the common multi-face ''Janus'' problem.
arXiv Detail & Related papers (2023-04-03T12:11:51Z) - AvatarCraft: Transforming Text into Neural Human Avatars with
Parameterized Shape and Pose Control [38.959851274747145]
AvatarCraft is a method for creating a 3D human avatar with a specific identity and artistic style that can be easily animated.
We use diffusion models to guide the learning of geometry and texture for a neural avatar based on a single text prompt.
We make the human avatar animatable by deforming the neural implicit field with an explicit warping field.
arXiv Detail & Related papers (2023-03-30T17:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.