AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars
- URL: http://arxiv.org/abs/2205.08535v1
- Date: Tue, 17 May 2022 17:59:19 GMT
- Title: AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars
- Authors: Fangzhou Hong, Mingyuan Zhang, Liang Pan, Zhongang Cai, Lei Yang,
Ziwei Liu
- Abstract summary: AvatarCLIP is a zero-shot text-driven framework for 3D avatar generation and animation.
We take advantage of the powerful vision-language model CLIP for supervising neural human generation.
By leveraging the priors learned in the motion VAE, a CLIP-guided reference-based motion synthesis method is proposed for the animation of the generated 3D avatar.
- Score: 37.43588165101838
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: 3D avatar creation plays a crucial role in the digital age. However, the
whole production process is prohibitively time-consuming and labor-intensive.
To democratize this technology to a larger audience, we propose AvatarCLIP, a
zero-shot text-driven framework for 3D avatar generation and animation. Unlike
professional software that requires expert knowledge, AvatarCLIP empowers
layman users to customize a 3D avatar with the desired shape and texture, and
drive the avatar with the described motions using solely natural languages. Our
key insight is to take advantage of the powerful vision-language model CLIP for
supervising neural human generation, in terms of 3D geometry, texture and
animation. Specifically, driven by natural language descriptions, we initialize
3D human geometry generation with a shape VAE network. Based on the generated
3D human shapes, a volume rendering model is utilized to further facilitate
geometry sculpting and texture generation. Moreover, by leveraging the priors
learned in the motion VAE, a CLIP-guided reference-based motion synthesis
method is proposed for the animation of the generated 3D avatar. Extensive
qualitative and quantitative experiments validate the effectiveness and
generalizability of AvatarCLIP on a wide range of avatars. Remarkably,
AvatarCLIP can generate unseen 3D avatars with novel animations, achieving
superior zero-shot capability.
Related papers
- DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D
Diffusion [69.67970568012599]
We present DreamWaltz-G, a novel learning framework for animatable 3D avatar generation from text.
The core of this framework lies in Score Distillation and Hybrid 3D Gaussian Avatar representation.
Our framework further supports diverse applications, including human video reenactment and multi-subject scene composition.
arXiv Detail & Related papers (2024-09-25T17:59:45Z) - Text2Avatar: Text to 3D Human Avatar Generation with Codebook-Driven
Body Controllable Attribute [33.330629835556664]
We propose Text2Avatar, which can generate realistic-style 3D avatars based on the coupled text prompts.
To alleviate the scarcity of realistic style 3D human avatar data, we utilize a pre-trained unconditional 3D human avatar generation model.
arXiv Detail & Related papers (2024-01-01T09:39:57Z) - Articulated 3D Head Avatar Generation using Text-to-Image Diffusion
Models [107.84324544272481]
The ability to generate diverse 3D articulated head avatars is vital to a plethora of applications, including augmented reality, cinematography, and education.
Recent work on text-guided 3D object generation has shown great promise in addressing these needs.
We show that our diffusion-based articulated head avatars outperform state-of-the-art approaches for this task.
arXiv Detail & Related papers (2023-07-10T19:15:32Z) - DreamWaltz: Make a Scene with Complex 3D Animatable Avatars [68.49935994384047]
We present DreamWaltz, a novel framework for generating and animating complex 3D avatars given text guidance and parametric human body prior.
For animation, our method learns an animatable 3D avatar representation from abundant image priors of diffusion model conditioned on various poses.
arXiv Detail & Related papers (2023-05-21T17:59:39Z) - DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via
Diffusion Models [55.71306021041785]
We present DreamAvatar, a text-and-shape guided framework for generating high-quality 3D human avatars.
We leverage the SMPL model to provide shape and pose guidance for the generation.
We also jointly optimize the losses computed from the full body and from the zoomed-in 3D head to alleviate the common multi-face ''Janus'' problem.
arXiv Detail & Related papers (2023-04-03T12:11:51Z) - AvatarCraft: Transforming Text into Neural Human Avatars with
Parameterized Shape and Pose Control [38.959851274747145]
AvatarCraft is a method for creating a 3D human avatar with a specific identity and artistic style that can be easily animated.
We use diffusion models to guide the learning of geometry and texture for a neural avatar based on a single text prompt.
We make the human avatar animatable by deforming the neural implicit field with an explicit warping field.
arXiv Detail & Related papers (2023-03-30T17:59:59Z) - AvatarGen: A 3D Generative Model for Animatable Human Avatars [108.11137221845352]
AvatarGen is an unsupervised generation of 3D-aware clothed humans with various appearances and controllable geometries.
Our method can generate animatable 3D human avatars with high-quality appearance and geometry modeling.
It is competent for many applications, e.g., single-view reconstruction, re-animation, and text-guided synthesis/editing.
arXiv Detail & Related papers (2022-11-26T15:15:45Z) - AvatarGen: a 3D Generative Model for Animatable Human Avatars [108.11137221845352]
AvatarGen is the first method that enables not only non-rigid human generation with diverse appearance but also full control over poses and viewpoints.
To model non-rigid dynamics, it introduces a deformation network to learn pose-dependent deformations in the canonical space.
Our method can generate animatable human avatars with high-quality appearance and geometry modeling, significantly outperforming previous 3D GANs.
arXiv Detail & Related papers (2022-08-01T01:27:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.