Related papers: Text2Avatar: Text to 3D Human Avatar Generation with Codebook-Driven Body Controllable Attribute

Text2Avatar: Text to 3D Human Avatar Generation with Codebook-Driven Body Controllable Attribute

URL: http://arxiv.org/abs/2401.00711v1
Date: Mon, 1 Jan 2024 09:39:57 GMT
Title: Text2Avatar: Text to 3D Human Avatar Generation with Codebook-Driven Body Controllable Attribute
Authors: Chaoqun Gong, Yuqin Dai, Ronghui Li, Achun Bao, Jun Li, Jian Yang, Yachao Zhang, Xiu Li
Abstract summary: We propose Text2Avatar, which can generate realistic-style 3D avatars based on the coupled text prompts. To alleviate the scarcity of realistic style 3D human avatar data, we utilize a pre-trained unconditional 3D human avatar generation model.
Score: 33.330629835556664
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generating 3D human models directly from text helps reduce the cost and time of character modeling. However, achieving multi-attribute controllable and realistic 3D human avatar generation is still challenging due to feature coupling and the scarcity of realistic 3D human avatar datasets. To address these issues, we propose Text2Avatar, which can generate realistic-style 3D avatars based on the coupled text prompts. Text2Avatar leverages a discrete codebook as an intermediate feature to establish a connection between text and avatars, enabling the disentanglement of features. Furthermore, to alleviate the scarcity of realistic style 3D human avatar data, we utilize a pre-trained unconditional 3D human avatar generation model to obtain a large amount of 3D avatar pseudo data, which allows Text2Avatar to achieve realistic style generation. Experimental results demonstrate that our method can generate realistic 3D avatars from coupled textual data, which is challenging for other existing methods in this field.

Related papers

Multimodal Generation of Animatable 3D Human Models with AvatarForge [67.31920821192323]
AvatarForge is a framework for generating animatable 3D human avatars from text or image inputs using AI-driven procedural generation. Our evaluations show that AvatarForge outperforms state-of-the-art methods in both text- and image-to-avatar generation.
arXiv Detail & Related papers (2025-03-11T08:29:18Z)
WildAvatar: Web-scale In-the-wild Video Dataset for 3D Avatar Creation [55.85887047136534]
WildAvatar is a web-scale in-the-wild human avatar creation dataset extracted from YouTube. We evaluate several state-of-the-art avatar creation methods on our dataset, highlighting the unexplored challenges in real-world applications on avatar creation.
arXiv Detail & Related papers (2024-07-02T11:17:48Z)
AvatarStudio: High-fidelity and Animatable 3D Avatar Creation from Text [71.09533176800707]
AvatarStudio is a coarse-to-fine generative model that generates explicit textured 3D meshes for animatable human avatars. By effectively leveraging the synergy between the articulated mesh representation and the DensePose-conditional diffusion model, AvatarStudio can create high-quality avatars.
arXiv Detail & Related papers (2023-11-29T18:59:32Z)
AvatarVerse: High-quality & Stable 3D Avatar Creation from Text and Pose [23.76390935089982]
We present AvatarVerse, a stable pipeline for generating high expressivequality 3D avatars from text descriptions and pose guidance. To this end, we propose zero-fidelity 3D modeling of 3D avatars that are not only more expressive, but also higher quality stablizes.
arXiv Detail & Related papers (2023-08-07T14:09:46Z)
AvatarBooth: High-Quality and Customizable 3D Human Avatar Generation [14.062402203105712]
AvatarBooth is a novel method for generating high-quality 3D avatars using text prompts or specific images. Our key contribution is the precise avatar generation control by using dual fine-tuned diffusion models. We present a multi-resolution rendering strategy that facilitates coarse-to-fine supervision of 3D avatar generation.
arXiv Detail & Related papers (2023-06-16T14:18:51Z)
DreamWaltz: Make a Scene with Complex 3D Animatable Avatars [68.49935994384047]
We present DreamWaltz, a novel framework for generating and animating complex 3D avatars given text guidance and parametric human body prior. For animation, our method learns an animatable 3D avatar representation from abundant image priors of diffusion model conditioned on various poses.
arXiv Detail & Related papers (2023-05-21T17:59:39Z)
DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models [55.71306021041785]
We present DreamAvatar, a text-and-shape guided framework for generating high-quality 3D human avatars. We leverage the SMPL model to provide shape and pose guidance for the generation. We also jointly optimize the losses computed from the full body and from the zoomed-in 3D head to alleviate the common multi-face ''Janus'' problem.
arXiv Detail & Related papers (2023-04-03T12:11:51Z)
Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion [66.26780039133122]
This paper presents a 3D generative model that uses diffusion models to automatically generate 3D digital avatars. The memory and processing costs in 3D are prohibitive for producing the rich details required for high-quality avatars. We can generate highly detailed avatars with realistic hairstyles and facial hair like beards.
arXiv Detail & Related papers (2022-12-12T18:59:40Z)
AvatarGen: a 3D Generative Model for Animatable Human Avatars [108.11137221845352]
AvatarGen is the first method that enables not only non-rigid human generation with diverse appearance but also full control over poses and viewpoints. To model non-rigid dynamics, it introduces a deformation network to learn pose-dependent deformations in the canonical space. Our method can generate animatable human avatars with high-quality appearance and geometry modeling, significantly outperforming previous 3D GANs.
arXiv Detail & Related papers (2022-08-01T01:27:02Z)
AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars [37.43588165101838]
AvatarCLIP is a zero-shot text-driven framework for 3D avatar generation and animation. We take advantage of the powerful vision-language model CLIP for supervising neural human generation. By leveraging the priors learned in the motion VAE, a CLIP-guided reference-based motion synthesis method is proposed for the animation of the generated 3D avatar.
arXiv Detail & Related papers (2022-05-17T17:59:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.