3D Human Reconstruction in the Wild with Synthetic Data Using Generative Models
- URL: http://arxiv.org/abs/2403.11111v2
- Date: Thu, 11 Apr 2024 12:01:34 GMT
- Title: 3D Human Reconstruction in the Wild with Synthetic Data Using Generative Models
- Authors: Yongtao Ge, Wenjia Wang, Yongfan Chen, Hao Chen, Chunhua Shen,
- Abstract summary: We propose an effective approach based on recent diffusion models, termed HumanWild, which can effortlessly generate human images and corresponding 3D mesh annotations.
By exclusively employing generative models, we generate large-scale in-the-wild human images and high-quality annotations, eliminating the need for real-world data collection.
- Score: 52.96248836582542
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this work, we show that synthetic data created by generative models is complementary to computer graphics (CG) rendered data for achieving remarkable generalization performance on diverse real-world scenes for 3D human pose and shape estimation (HPS). Specifically, we propose an effective approach based on recent diffusion models, termed HumanWild, which can effortlessly generate human images and corresponding 3D mesh annotations. We first collect a large-scale human-centric dataset with comprehensive annotations, e.g., text captions and surface normal images. Then, we train a customized ControlNet model upon this dataset to generate diverse human images and initial ground-truth labels. At the core of this step is that we can easily obtain numerous surface normal images from a 3D human parametric model, e.g., SMPL-X, by rendering the 3D mesh onto the image plane. As there exists inevitable noise in the initial labels, we then apply an off-the-shelf foundation segmentation model, i.e., SAM, to filter negative data samples. Our data generation pipeline is flexible and customizable to facilitate different real-world tasks, e.g., ego-centric scenes and perspective-distortion scenes. The generated dataset comprises 0.79M images with corresponding 3D annotations, covering versatile viewpoints, scenes, and human identities. We train various HPS regressors on top of the generated data and evaluate them on a wide range of benchmarks (3DPW, RICH, EgoBody, AGORA, SSP-3D) to verify the effectiveness of the generated data. By exclusively employing generative models, we generate large-scale in-the-wild human images and high-quality annotations, eliminating the need for real-world data collection.
Related papers
- FAMOUS: High-Fidelity Monocular 3D Human Digitization Using View Synthesis [51.193297565630886]
The challenge of accurately inferring texture remains, particularly in obscured areas such as the back of a person in frontal-view images.
This limitation in texture prediction largely stems from the scarcity of large-scale and diverse 3D datasets.
We propose leveraging extensive 2D fashion datasets to enhance both texture and shape prediction in 3D human digitization.
arXiv Detail & Related papers (2024-10-13T01:25:05Z) - 3D-VirtFusion: Synthetic 3D Data Augmentation through Generative Diffusion Models and Controllable Editing [52.68314936128752]
We propose a new paradigm to automatically generate 3D labeled training data by harnessing the power of pretrained large foundation models.
For each target semantic class, we first generate 2D images of a single object in various structure and appearance via diffusion models and chatGPT generated text prompts.
We transform these augmented images into 3D objects and construct virtual scenes by random composition.
arXiv Detail & Related papers (2024-08-25T09:31:22Z) - Neural Localizer Fields for Continuous 3D Human Pose and Shape Estimation [32.30055363306321]
We propose a paradigm for seamlessly unifying different human pose and shape-related tasks and datasets.
Our formulation is centered on the ability - both at training and test time - to query any arbitrary point of the human volume.
We can naturally exploit differently annotated data sources including mesh, 2D/3D skeleton and dense pose, without having to convert between them.
arXiv Detail & Related papers (2024-07-10T10:44:18Z) - GeoGen: Geometry-Aware Generative Modeling via Signed Distance Functions [22.077366472693395]
We introduce a new generative approach for synthesizing 3D geometry and images from single-view collections.
By employing volumetric rendering using neural radiance fields, they inherit a key limitation: the generated geometry is noisy and unconstrained.
We propose GeoGen, a new SDF-based 3D generative model trained in an end-to-end manner.
arXiv Detail & Related papers (2024-06-06T17:00:10Z) - ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models [65.22994156658918]
We present a method that learns to generate multi-view images in a single denoising process from real-world data.
We design an autoregressive generation that renders more 3D-consistent images at any viewpoint.
arXiv Detail & Related papers (2024-03-04T07:57:05Z) - Learning Dense Correspondence from Synthetic Environments [27.841736037738286]
Existing methods map manually labelled human pixels in real 2D images onto the 3D surface, which is prone to human error.
We propose to solve the problem of data scarcity by training 2D-3D human mapping algorithms using automatically generated synthetic data.
arXiv Detail & Related papers (2022-03-24T08:13:26Z) - 3D-Aware Semantic-Guided Generative Model for Human Synthesis [67.86621343494998]
This paper proposes a 3D-aware Semantic-Guided Generative Model (3D-SGAN) for human image synthesis.
Our experiments on the DeepFashion dataset show that 3D-SGAN significantly outperforms the most recent baselines.
arXiv Detail & Related papers (2021-12-02T17:10:53Z) - UltraPose: Synthesizing Dense Pose with 1 Billion Points by Human-body
Decoupling 3D Model [58.70130563417079]
We introduce a new 3D human-body model with a series of decoupled parameters that could freely control the generation of the body.
Compared to the existing manually annotated DensePose-COCO dataset, the synthetic UltraPose has ultra dense image-to-surface correspondences without annotation cost and error.
arXiv Detail & Related papers (2021-10-28T16:24:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.