Multimodal Generation of Animatable 3D Human Models with AvatarForge
- URL: http://arxiv.org/abs/2503.08165v1
- Date: Tue, 11 Mar 2025 08:29:18 GMT
- Title: Multimodal Generation of Animatable 3D Human Models with AvatarForge
- Authors: Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang,
- Abstract summary: AvatarForge is a framework for generating animatable 3D human avatars from text or image inputs using AI-driven procedural generation.<n>Our evaluations show that AvatarForge outperforms state-of-the-art methods in both text- and image-to-avatar generation.
- Score: 67.31920821192323
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We introduce AvatarForge, a framework for generating animatable 3D human avatars from text or image inputs using AI-driven procedural generation. While diffusion-based methods have made strides in general 3D object generation, they struggle with high-quality, customizable human avatars due to the complexity and diversity of human body shapes, poses, exacerbated by the scarcity of high-quality data. Additionally, animating these avatars remains a significant challenge for existing methods. AvatarForge overcomes these limitations by combining LLM-based commonsense reasoning with off-the-shelf 3D human generators, enabling fine-grained control over body and facial details. Unlike diffusion models which often rely on pre-trained datasets lacking precise control over individual human features, AvatarForge offers a more flexible approach, bringing humans into the iterative design and modeling loop, with its auto-verification system allowing for continuous refinement of the generated avatars, and thus promoting high accuracy and customization. Our evaluations show that AvatarForge outperforms state-of-the-art methods in both text- and image-to-avatar generation, making it a versatile tool for artistic creation and animation.
Related papers
- DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D
Diffusion [69.67970568012599]
We present DreamWaltz-G, a novel learning framework for animatable 3D avatar generation from text.
The core of this framework lies in Score Distillation and Hybrid 3D Gaussian Avatar representation.
Our framework further supports diverse applications, including human video reenactment and multi-subject scene composition.
arXiv Detail & Related papers (2024-09-25T17:59:45Z) - DivAvatar: Diverse 3D Avatar Generation with a Single Prompt [95.9978722953278]
DivAvatar is a framework that generates diverse avatars from a single text prompt.
It has two key designs that help achieve generation diversity and visual quality.
Extensive experiments show that DivAvatar is highly versatile in generating avatars of diverse appearances.
arXiv Detail & Related papers (2024-02-27T08:10:31Z) - XAGen: 3D Expressive Human Avatars Generation [76.69560679209171]
XAGen is the first 3D generative model for human avatars capable of expressive control over body, face, and hands.
We propose a multi-part rendering technique that disentangles the synthesis of body, face, and hands.
Experiments show that XAGen surpasses state-of-the-art methods in terms of realism, diversity, and expressive control abilities.
arXiv Detail & Related papers (2023-11-22T18:30:42Z) - AvatarBooth: High-Quality and Customizable 3D Human Avatar Generation [14.062402203105712]
AvatarBooth is a novel method for generating high-quality 3D avatars using text prompts or specific images.
Our key contribution is the precise avatar generation control by using dual fine-tuned diffusion models.
We present a multi-resolution rendering strategy that facilitates coarse-to-fine supervision of 3D avatar generation.
arXiv Detail & Related papers (2023-06-16T14:18:51Z) - Text-Conditional Contextualized Avatars For Zero-Shot Personalization [47.85747039373798]
We propose a pipeline that enables personalization of image generation with avatars capturing a user's identity in a delightful way.
Our pipeline is zero-shot, avatar texture and style agnostic, and does not require training on the avatar at all.
We show, for the first time, how to leverage large-scale image datasets to learn human 3D pose parameters.
arXiv Detail & Related papers (2023-04-14T22:00:44Z) - DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via
Diffusion Models [55.71306021041785]
We present DreamAvatar, a text-and-shape guided framework for generating high-quality 3D human avatars.
We leverage the SMPL model to provide shape and pose guidance for the generation.
We also jointly optimize the losses computed from the full body and from the zoomed-in 3D head to alleviate the common multi-face ''Janus'' problem.
arXiv Detail & Related papers (2023-04-03T12:11:51Z) - AvatarGen: a 3D Generative Model for Animatable Human Avatars [108.11137221845352]
AvatarGen is the first method that enables not only non-rigid human generation with diverse appearance but also full control over poses and viewpoints.
To model non-rigid dynamics, it introduces a deformation network to learn pose-dependent deformations in the canonical space.
Our method can generate animatable human avatars with high-quality appearance and geometry modeling, significantly outperforming previous 3D GANs.
arXiv Detail & Related papers (2022-08-01T01:27:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.