Related papers: Multimodal Generation of Animatable 3D Human Models with AvatarForge

Multimodal Generation of Animatable 3D Human Models with AvatarForge

URL: http://arxiv.org/abs/2503.08165v1
Date: Tue, 11 Mar 2025 08:29:18 GMT
Title: Multimodal Generation of Animatable 3D Human Models with AvatarForge
Authors: Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang,
Abstract summary: AvatarForge is a framework for generating animatable 3D human avatars from text or image inputs using AI-driven procedural generation.<n>Our evaluations show that AvatarForge outperforms state-of-the-art methods in both text- and image-to-avatar generation.
Score: 67.31920821192323
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We introduce AvatarForge, a framework for generating animatable 3D human avatars from text or image inputs using AI-driven procedural generation. While diffusion-based methods have made strides in general 3D object generation, they struggle with high-quality, customizable human avatars due to the complexity and diversity of human body shapes, poses, exacerbated by the scarcity of high-quality data. Additionally, animating these avatars remains a significant challenge for existing methods. AvatarForge overcomes these limitations by combining LLM-based commonsense reasoning with off-the-shelf 3D human generators, enabling fine-grained control over body and facial details. Unlike diffusion models which often rely on pre-trained datasets lacking precise control over individual human features, AvatarForge offers a more flexible approach, bringing humans into the iterative design and modeling loop, with its auto-verification system allowing for continuous refinement of the generated avatars, and thus promoting high accuracy and customization. Our evaluations show that AvatarForge outperforms state-of-the-art methods in both text- and image-to-avatar generation, making it a versatile tool for artistic creation and animation.

Related papers

Hyper Diffusion Avatars: Dynamic Human Avatar Generation using Network Weight Space Diffusion [45.88321772203678]
We propose a novel approach that unites the strengths of person-specific rendering and diffusion-based generative modeling.<n>Our method follows a two-stage pipeline: first, we optimize a set of person-specific UNets, with each network representing a dynamic human avatar.<n>During inference, our method generates network weights for real-time, controllable rendering of dynamic human avatars.
arXiv Detail & Related papers (2025-09-04T12:15:55Z)
SmartAvatar: Text- and Image-Guided Human Avatar Generation with VLM AI Agents [91.26239311240873]
SmartAvatar is a vision-language-agent-driven framework for generating fully rigged, animation-ready 3D human avatars.<n>A key innovation is an autonomous verification loop, where the agent renders draft avatars.<n>The generated avatars are fully rigged and support pose manipulation with consistent identity and appearance.
arXiv Detail & Related papers (2025-06-05T03:49:01Z)
AdaHuman: Animatable Detailed 3D Human Generation with Compositional Multiview Diffusion [56.12859795754579]
AdaHuman is a novel framework that generates high-fidelity animatable 3D avatars from a single in-the-wild image.<n>AdaHuman incorporates two key innovations: a pose-conditioned 3D joint diffusion model and a compositional 3DGS refinement module.
arXiv Detail & Related papers (2025-05-30T17:59:54Z)
EVA: Expressive Virtual Avatars from Multi-view Videos [51.33851869426057]
We introduce Expressive Virtual Avatars (EVA), an actor-specific, fully controllable, and expressive human avatar framework.<n>EVA achieves high-fidelity, lifelike renderings in real time while enabling independent control of facial expressions, body movements, and hand gestures.<n>This work represents a significant advancement towards fully drivable digital human models.
arXiv Detail & Related papers (2025-05-21T11:22:52Z)
DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion [69.67970568012599]
We present DreamWaltz-G, a novel learning framework for animatable 3D avatar generation from text. The core of this framework lies in Score Distillation and Hybrid 3D Gaussian Avatar representation. Our framework further supports diverse applications, including human video reenactment and multi-subject scene composition.
arXiv Detail & Related papers (2024-09-25T17:59:45Z)
DivAvatar: Diverse 3D Avatar Generation with a Single Prompt [95.9978722953278]
DivAvatar is a framework that generates diverse avatars from a single text prompt. It has two key designs that help achieve generation diversity and visual quality. Extensive experiments show that DivAvatar is highly versatile in generating avatars of diverse appearances.
arXiv Detail & Related papers (2024-02-27T08:10:31Z)
XAGen: 3D Expressive Human Avatars Generation [76.69560679209171]
XAGen is the first 3D generative model for human avatars capable of expressive control over body, face, and hands. We propose a multi-part rendering technique that disentangles the synthesis of body, face, and hands. Experiments show that XAGen surpasses state-of-the-art methods in terms of realism, diversity, and expressive control abilities.
arXiv Detail & Related papers (2023-11-22T18:30:42Z)
AvatarBooth: High-Quality and Customizable 3D Human Avatar Generation [14.062402203105712]
AvatarBooth is a novel method for generating high-quality 3D avatars using text prompts or specific images. Our key contribution is the precise avatar generation control by using dual fine-tuned diffusion models. We present a multi-resolution rendering strategy that facilitates coarse-to-fine supervision of 3D avatar generation.
arXiv Detail & Related papers (2023-06-16T14:18:51Z)
Text-Conditional Contextualized Avatars For Zero-Shot Personalization [47.85747039373798]
We propose a pipeline that enables personalization of image generation with avatars capturing a user's identity in a delightful way. Our pipeline is zero-shot, avatar texture and style agnostic, and does not require training on the avatar at all. We show, for the first time, how to leverage large-scale image datasets to learn human 3D pose parameters.
arXiv Detail & Related papers (2023-04-14T22:00:44Z)
DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models [55.71306021041785]
We present DreamAvatar, a text-and-shape guided framework for generating high-quality 3D human avatars. We leverage the SMPL model to provide shape and pose guidance for the generation. We also jointly optimize the losses computed from the full body and from the zoomed-in 3D head to alleviate the common multi-face ''Janus'' problem.
arXiv Detail & Related papers (2023-04-03T12:11:51Z)
AvatarGen: a 3D Generative Model for Animatable Human Avatars [108.11137221845352]
AvatarGen is the first method that enables not only non-rigid human generation with diverse appearance but also full control over poses and viewpoints. To model non-rigid dynamics, it introduces a deformation network to learn pose-dependent deformations in the canonical space. Our method can generate animatable human avatars with high-quality appearance and geometry modeling, significantly outperforming previous 3D GANs.
arXiv Detail & Related papers (2022-08-01T01:27:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.