AvatarMMC: 3D Head Avatar Generation and Editing with Multi-Modal
Conditioning
- URL: http://arxiv.org/abs/2402.05803v1
- Date: Thu, 8 Feb 2024 16:41:20 GMT
- Title: AvatarMMC: 3D Head Avatar Generation and Editing with Multi-Modal
Conditioning
- Authors: Wamiq Reyaz Para, Abdelrahman Eldesokey, Zhenyu Li, Pradyumna Reddy,
Jiankang Deng, Peter Wonka
- Abstract summary: We introduce an approach for 3D head avatar generation and editing based on a 3D Generative Adversarial Network (GAN) and a Latent Diffusion Model (LDM)
We exploit the conditioning capabilities of LDMs to enable multi-modal control over the latent space of a pre-trained 3D GAN.
Our method can generate and edit 3D head avatars given a mixture of control signals such as RGB input, segmentation masks, and global attributes.
- Score: 61.59722900152847
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We introduce an approach for 3D head avatar generation and editing with
multi-modal conditioning based on a 3D Generative Adversarial Network (GAN) and
a Latent Diffusion Model (LDM). 3D GANs can generate high-quality head avatars
given a single or no condition. However, it is challenging to generate samples
that adhere to multiple conditions of different modalities. On the other hand,
LDMs excel at learning complex conditional distributions. To this end, we
propose to exploit the conditioning capabilities of LDMs to enable multi-modal
control over the latent space of a pre-trained 3D GAN. Our method can generate
and edit 3D head avatars given a mixture of control signals such as RGB input,
segmentation masks, and global attributes. This provides better control over
the generation and editing of synthetic avatars both globally and locally.
Experiments show that our proposed approach outperforms a solely GAN-based
approach both qualitatively and quantitatively on generation and editing tasks.
To the best of our knowledge, our approach is the first to introduce
multi-modal conditioning to 3D avatar generation and editing.
\\href{avatarmmc-sig24.github.io}{Project Page}
Related papers
- RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models [56.13752698926105]
We present RodinHD, which can generate high-fidelity 3D avatars from a portrait image.
We first identify an overlooked problem of catastrophic forgetting that arises when fitting triplanes sequentially on many avatars.
We optimize the guiding effect of the portrait image by computing a finer-grained hierarchical representation that captures rich 2D texture cues, and injecting them to the 3D diffusion model at multiple layers via cross-attention.
When trained on 46K avatars with a noise schedule optimized for triplanes, the resulting model can generate 3D avatars with notably better details than previous methods and can generalize to in-the-wild
arXiv Detail & Related papers (2024-07-09T15:14:45Z) - Instant 3D Human Avatar Generation using Image Diffusion Models [37.45927867788691]
AvatarPopUp is a method for fast, high quality 3D human avatar generation from different input modalities.
Our approach can produce a 3D model in as few as 2 seconds.
arXiv Detail & Related papers (2024-06-11T17:47:27Z) - $E^{3}$Gen: Efficient, Expressive and Editable Avatars Generation [71.72171053129655]
This paper aims to introduce 3D Gaussian for efficient, expressive, and editable digital avatar generation.
We propose a novel avatar generation method named $E3$Gen to effectively address these challenges.
Our method achieves superior performance in avatar generation and enables expressive full-body pose control and editing.
arXiv Detail & Related papers (2024-05-29T15:43:49Z) - GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image [89.70322127648349]
We propose a generic avatar editing approach that can be universally applied to various 3DMM driving volumetric head avatars.
To achieve this goal, we design a novel expression-aware modification generative model, which enables lift 2D editing from a single image to a consistent 3D modification field.
arXiv Detail & Related papers (2024-04-02T17:58:35Z) - Morphable Diffusion: 3D-Consistent Diffusion for Single-image Avatar Creation [14.064983137553353]
We aim to enhance the quality and functionality of generative diffusion models for the task of creating controllable, photorealistic human avatars.
We achieve this by integrating a 3D morphable model into the state-of-the-art multi-view-consistent diffusion approach.
Our proposed framework is the first diffusion model to enable the creation of fully 3D-consistent, animatable, and photorealistic human avatars.
arXiv Detail & Related papers (2024-01-09T18:59:04Z) - XAGen: 3D Expressive Human Avatars Generation [76.69560679209171]
XAGen is the first 3D generative model for human avatars capable of expressive control over body, face, and hands.
We propose a multi-part rendering technique that disentangles the synthesis of body, face, and hands.
Experiments show that XAGen surpasses state-of-the-art methods in terms of realism, diversity, and expressive control abilities.
arXiv Detail & Related papers (2023-11-22T18:30:42Z) - AvatarBooth: High-Quality and Customizable 3D Human Avatar Generation [14.062402203105712]
AvatarBooth is a novel method for generating high-quality 3D avatars using text prompts or specific images.
Our key contribution is the precise avatar generation control by using dual fine-tuned diffusion models.
We present a multi-resolution rendering strategy that facilitates coarse-to-fine supervision of 3D avatar generation.
arXiv Detail & Related papers (2023-06-16T14:18:51Z) - DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via
Diffusion Models [55.71306021041785]
We present DreamAvatar, a text-and-shape guided framework for generating high-quality 3D human avatars.
We leverage the SMPL model to provide shape and pose guidance for the generation.
We also jointly optimize the losses computed from the full body and from the zoomed-in 3D head to alleviate the common multi-face ''Janus'' problem.
arXiv Detail & Related papers (2023-04-03T12:11:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.