EVA3D: Compositional 3D Human Generation from 2D Image Collections
- URL: http://arxiv.org/abs/2210.04888v1
- Date: Mon, 10 Oct 2022 17:59:31 GMT
- Title: EVA3D: Compositional 3D Human Generation from 2D Image Collections
- Authors: Fangzhou Hong, Zhaoxi Chen, Yushi Lan, Liang Pan, Ziwei Liu
- Abstract summary: EVA3D is an unconditional 3D human generative model learned from 2D image collections only.
It can sample 3D humans with detailed geometry and render high-quality images (up to 512x256) without bells and whistles.
It achieves state-of-the-art 3D human generation performance regarding both geometry and texture quality.
- Score: 27.70991135165909
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Inverse graphics aims to recover 3D models from 2D observations. Utilizing
differentiable rendering, recent 3D-aware generative models have shown
impressive results of rigid object generation using 2D images. However, it
remains challenging to generate articulated objects, like human bodies, due to
their complexity and diversity in poses and appearances. In this work, we
propose, EVA3D, an unconditional 3D human generative model learned from 2D
image collections only. EVA3D can sample 3D humans with detailed geometry and
render high-quality images (up to 512x256) without bells and whistles (e.g.
super resolution). At the core of EVA3D is a compositional human NeRF
representation, which divides the human body into local parts. Each part is
represented by an individual volume. This compositional representation enables
1) inherent human priors, 2) adaptive allocation of network parameters, 3)
efficient training and rendering. Moreover, to accommodate for the
characteristics of sparse 2D human image collections (e.g. imbalanced pose
distribution), we propose a pose-guided sampling strategy for better GAN
learning. Extensive experiments validate that EVA3D achieves state-of-the-art
3D human generation performance regarding both geometry and texture quality.
Notably, EVA3D demonstrates great potential and scalability to
"inverse-graphics" diverse human bodies with a clean framework.
Related papers
- Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior [57.986512832738704]
We present a new framework Sculpt3D that equips the current pipeline with explicit injection of 3D priors from retrieved reference objects without re-training the 2D diffusion model.
Specifically, we demonstrate that high-quality and diverse 3D geometry can be guaranteed by keypoints supervision through a sparse ray sampling approach.
These two decoupled designs effectively harness 3D information from reference objects to generate 3D objects while preserving the generation quality of the 2D diffusion model.
arXiv Detail & Related papers (2024-03-14T07:39:59Z) - En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D
Synthetic Data [36.51674664590734]
We present En3D, an enhanced izable scheme for high-qualityd 3D human avatars.
Unlike previous works that rely on scarce 3D datasets or limited 2D collections with imbalance viewing angles and pose priors, our approach aims to develop a zero-shot 3D capable of producing 3D humans.
arXiv Detail & Related papers (2024-01-02T12:06:31Z) - AG3D: Learning to Generate 3D Avatars from 2D Image Collections [96.28021214088746]
We propose a new adversarial generative model of realistic 3D people from 2D images.
Our method captures shape and deformation of the body and loose clothing by adopting a holistic 3D generator.
We experimentally find that our method outperforms previous 3D- and articulation-aware methods in terms of geometry and appearance.
arXiv Detail & Related papers (2023-05-03T17:56:24Z) - 3DHumanGAN: 3D-Aware Human Image Generation with 3D Pose Mapping [37.14866512377012]
3DHumanGAN is a 3D-aware generative adversarial network that synthesizes photorealistic images of full-body humans.
We propose a novel generator architecture in which a 2D convolutional backbone is modulated by a 3D pose mapping network.
arXiv Detail & Related papers (2022-12-14T17:59:03Z) - DRaCoN -- Differentiable Rasterization Conditioned Neural Radiance
Fields for Articulated Avatars [92.37436369781692]
We present DRaCoN, a framework for learning full-body volumetric avatars.
It exploits the advantages of both the 2D and 3D neural rendering techniques.
Experiments on the challenging ZJU-MoCap and Human3.6M datasets indicate that DRaCoN outperforms state-of-the-art methods.
arXiv Detail & Related papers (2022-03-29T17:59:15Z) - 3D-Aware Semantic-Guided Generative Model for Human Synthesis [67.86621343494998]
This paper proposes a 3D-aware Semantic-Guided Generative Model (3D-SGAN) for human image synthesis.
Our experiments on the DeepFashion dataset show that 3D-SGAN significantly outperforms the most recent baselines.
arXiv Detail & Related papers (2021-12-02T17:10:53Z) - Fully Understanding Generic Objects: Modeling, Segmentation, and
Reconstruction [33.95791350070165]
Inferring 3D structure of a generic object from a 2D image is a long-standing objective of computer vision.
We take an alternative approach with semi-supervised learning. That is, for a 2D image of a generic object, we decompose it into latent representations of category, shape and albedo.
We show that the complete shape and albedo modeling enables us to leverage real 2D images in both modeling and model fitting.
arXiv Detail & Related papers (2021-04-02T02:39:29Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z) - Towards Realistic 3D Embedding via View Alignment [53.89445873577063]
This paper presents an innovative View Alignment GAN (VA-GAN) that composes new images by embedding 3D models into 2D background images realistically and automatically.
VA-GAN consists of a texture generator and a differential discriminator that are inter-connected and end-to-end trainable.
arXiv Detail & Related papers (2020-07-14T14:45:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.