Related papers: Instant 3D Human Avatar Generation using Image Diffusion Models

Instant 3D Human Avatar Generation using Image Diffusion Models

URL: http://arxiv.org/abs/2406.07516v2
Date: Fri, 12 Jul 2024 11:23:26 GMT
Title: Instant 3D Human Avatar Generation using Image Diffusion Models
Authors: Nikos Kolotouros, Thiemo Alldieck, Enric Corona, Eduard Gabriel Bazavan, Cristian Sminchisescu,
Abstract summary: AvatarPopUp is a method for fast, high quality 3D human avatar generation from different input modalities. Our approach can produce a 3D model in as few as 2 seconds.
Score: 37.45927867788691
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present AvatarPopUp, a method for fast, high quality 3D human avatar generation from different input modalities, such as images and text prompts and with control over the generated pose and shape. The common theme is the use of diffusion-based image generation networks that are specialized for each particular task, followed by a 3D lifting network. We purposefully decouple the generation from the 3D modeling which allow us to leverage powerful image synthesis priors, trained on billions of text-image pairs. We fine-tune latent diffusion networks with additional image conditioning for image generation and back-view prediction, and to support qualitatively different multiple 3D hypotheses. Our partial fine-tuning approach allows to adapt the networks for each task without inducing catastrophic forgetting. In our experiments, we demonstrate that our method produces accurate, high-quality 3D avatars with diverse appearance that respect the multimodal text, image, and body control signals. Our approach can produce a 3D model in as few as 2 seconds, a four orders of magnitude speedup wrt the vast majority of existing methods, most of which solve only a subset of our tasks, and with fewer controls. AvatarPopUp enables applications that require the controlled 3D generation of human avatars at scale. The project website can be found at https://www.nikoskolot.com/avatarpopup/.

Related papers

Multimodal Generation of Animatable 3D Human Models with AvatarForge [67.31920821192323]
AvatarForge is a framework for generating animatable 3D human avatars from text or image inputs using AI-driven procedural generation. Our evaluations show that AvatarForge outperforms state-of-the-art methods in both text- and image-to-avatar generation.
arXiv Detail & Related papers (2025-03-11T08:29:18Z)
BAG: Body-Aligned 3D Wearable Asset Generation [59.7545477546307]
Bag is a Body-aligned Asset Generation method to output 3D wearable asset that can be automatically dressed on given 3D human bodies. Results demonstrate significant advantages over existing methods in terms of image prompt-following capability, shape diversity, and shape quality.
arXiv Detail & Related papers (2025-01-27T16:23:45Z)
DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion [69.67970568012599]
We present DreamWaltz-G, a novel learning framework for animatable 3D avatar generation from text. The core of this framework lies in Score Distillation and Hybrid 3D Gaussian Avatar representation. Our framework further supports diverse applications, including human video reenactment and multi-subject scene composition.
arXiv Detail & Related papers (2024-09-25T17:59:45Z)
DivAvatar: Diverse 3D Avatar Generation with a Single Prompt [95.9978722953278]
DivAvatar is a framework that generates diverse avatars from a single text prompt. It has two key designs that help achieve generation diversity and visual quality. Extensive experiments show that DivAvatar is highly versatile in generating avatars of diverse appearances.
arXiv Detail & Related papers (2024-02-27T08:10:31Z)
AvatarMMC: 3D Head Avatar Generation and Editing with Multi-Modal Conditioning [61.59722900152847]
We introduce an approach for 3D head avatar generation and editing based on a 3D Generative Adversarial Network (GAN) and a Latent Diffusion Model (LDM) We exploit the conditioning capabilities of LDMs to enable multi-modal control over the latent space of a pre-trained 3D GAN. Our method can generate and edit 3D head avatars given a mixture of control signals such as RGB input, segmentation masks, and global attributes.
arXiv Detail & Related papers (2024-02-08T16:41:20Z)
AvatarBooth: High-Quality and Customizable 3D Human Avatar Generation [14.062402203105712]
AvatarBooth is a novel method for generating high-quality 3D avatars using text prompts or specific images. Our key contribution is the precise avatar generation control by using dual fine-tuned diffusion models. We present a multi-resolution rendering strategy that facilitates coarse-to-fine supervision of 3D avatar generation.
arXiv Detail & Related papers (2023-06-16T14:18:51Z)
StyleAvatar3D: Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation [103.88928334431786]
We present a novel method for generating high-quality, stylized 3D avatars. We use pre-trained image-text diffusion models for data generation and a Generative Adversarial Network (GAN)-based 3D generation network for training. Our approach demonstrates superior performance over current state-of-the-art methods in terms of visual quality and diversity of the produced avatars.
arXiv Detail & Related papers (2023-05-30T13:09:21Z)
DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models [55.71306021041785]
We present DreamAvatar, a text-and-shape guided framework for generating high-quality 3D human avatars. We leverage the SMPL model to provide shape and pose guidance for the generation. We also jointly optimize the losses computed from the full body and from the zoomed-in 3D head to alleviate the common multi-face ''Janus'' problem.
arXiv Detail & Related papers (2023-04-03T12:11:51Z)
AvatarGen: A 3D Generative Model for Animatable Human Avatars [108.11137221845352]
AvatarGen is an unsupervised generation of 3D-aware clothed humans with various appearances and controllable geometries. Our method can generate animatable 3D human avatars with high-quality appearance and geometry modeling. It is competent for many applications, e.g., single-view reconstruction, re-animation, and text-guided synthesis/editing.
arXiv Detail & Related papers (2022-11-26T15:15:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.