SIGMAN:Scaling 3D Human Gaussian Generation with Millions of Assets
- URL: http://arxiv.org/abs/2504.06982v1
- Date: Wed, 09 Apr 2025 15:38:18 GMT
- Title: SIGMAN:Scaling 3D Human Gaussian Generation with Millions of Assets
- Authors: Yuhang Yang, Fengqi Liu, Yixing Lu, Qin Zhao, Pingyu Wu, Wei Zhai, Ran Yi, Yang Cao, Lizhuang Ma, Zheng-Jun Zha, Junting Dong,
- Abstract summary: We propose a latent space generation paradigm for 3D human digitization.<n>We transform the ill-posed low-to-high-dimensional mapping problem into a learnable distribution shift.<n>We employ the multi-view optimization approach combined with synthetic data to construct the HGS-1M dataset.
- Score: 72.26350984924129
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D human digitization has long been a highly pursued yet challenging task. Existing methods aim to generate high-quality 3D digital humans from single or multiple views, but remain primarily constrained by current paradigms and the scarcity of 3D human assets. Specifically, recent approaches fall into several paradigms: optimization-based and feed-forward (both single-view regression and multi-view generation with reconstruction). However, they are limited by slow speed, low quality, cascade reasoning, and ambiguity in mapping low-dimensional planes to high-dimensional space due to occlusion and invisibility, respectively. Furthermore, existing 3D human assets remain small-scale, insufficient for large-scale training. To address these challenges, we propose a latent space generation paradigm for 3D human digitization, which involves compressing multi-view images into Gaussians via a UV-structured VAE, along with DiT-based conditional generation, we transform the ill-posed low-to-high-dimensional mapping problem into a learnable distribution shift, which also supports end-to-end inference. In addition, we employ the multi-view optimization approach combined with synthetic data to construct the HGS-1M dataset, which contains $1$ million 3D Gaussian assets to support the large-scale training. Experimental results demonstrate that our paradigm, powered by large-scale training, produces high-quality 3D human Gaussians with intricate textures, facial details, and loose clothing deformation.
Related papers
- MVD-HuGaS: Human Gaussians from a Single Image via 3D Human Multi-view Diffusion Prior [35.704591162502375]
We present emphMVD-HuGaS, enabling free-view 3D human rendering from a single image via a multi-view human diffusion model.
Experiments on Thuman2.0 and 2K2K datasets show that the proposed MVD-HuGaS achieves state-of-the-art performance on single-view 3D human rendering.
arXiv Detail & Related papers (2025-03-11T09:37:15Z) - TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models [69.0220314849478]
TripoSG is a new streamlined shape diffusion paradigm capable of generating high-fidelity 3D meshes with precise correspondence to input images.<n>The resulting 3D shapes exhibit enhanced detail due to high-resolution capabilities and demonstrate exceptional fidelity to input images.<n>To foster progress and innovation in the field of 3D generation, we will make our model publicly available.
arXiv Detail & Related papers (2025-02-10T16:07:54Z) - DSplats: 3D Generation by Denoising Splats-Based Multiview Diffusion Models [67.50989119438508]
We introduce DSplats, a novel method that directly denoises multiview images using Gaussian-based Reconstructors to produce realistic 3D assets.<n>Our experiments demonstrate that DSplats not only produces high-quality, spatially consistent outputs, but also sets a new standard in single-image to 3D reconstruction.
arXiv Detail & Related papers (2024-12-11T07:32:17Z) - 3D-Consistent Human Avatars with Sparse Inputs via Gaussian Splatting and Contrastive Learning [19.763523500564542]
CHASE is a novel framework that achieves dense-input-level performance using only sparse inputs.
We introduce a Dynamic Avatar Adjustment (DAA) module, which refines deformed Gaussians by leveraging similar poses from the training set.
While designed for sparse inputs, CHASE surpasses state-of-the-art methods across both full and sparse settings on ZJU-MoCap and H36M datasets.
arXiv Detail & Related papers (2024-08-19T02:46:23Z) - Generalizable Human Gaussians for Sparse View Synthesis [48.47812125126829]
This paper introduces a new method to learn generalizable human Gaussians that allows photorealistic and accurate view-rendering of a new human subject from a limited set of sparse views.
A pivotal innovation of our approach involves reformulating the learning of 3D Gaussian parameters into a regression process defined on the 2D UV space of a human template.
Our method outperforms recent methods on both within-dataset generalization as well as cross-dataset generalization settings.
arXiv Detail & Related papers (2024-07-17T17:56:30Z) - Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability [118.26563926533517]
Auto-regressive models have achieved impressive results in 2D image generation by modeling joint distributions in grid space.
We extend auto-regressive models to 3D domains, and seek a stronger ability of 3D shape generation by improving auto-regressive models at capacity and scalability simultaneously.
arXiv Detail & Related papers (2024-02-19T15:33:09Z) - MVHuman: Tailoring 2D Diffusion with Multi-view Sampling For Realistic
3D Human Generation [45.88714821939144]
We present an alternative scheme named MVHuman to generate human radiance fields from text guidance.
Our core is a multi-view sampling strategy to tailor the denoising processes of the pre-trained network for generating consistent multi-view images.
arXiv Detail & Related papers (2023-12-15T11:56:26Z) - Pushing the Limits of 3D Shape Generation at Scale [65.24420181727615]
We present a significant breakthrough in 3D shape generation by scaling it to unprecedented dimensions.
We have developed a model with an astounding 3.6 billion trainable parameters, establishing it as the largest 3D shape generation model to date, named Argus-3D.
arXiv Detail & Related papers (2023-06-20T13:01:19Z) - Get3DHuman: Lifting StyleGAN-Human into a 3D Generative Model using
Pixel-aligned Reconstruction Priors [56.192682114114724]
Get3DHuman is a novel 3D human framework that can significantly boost the realism and diversity of the generated outcomes.
Our key observation is that the 3D generator can profit from human-related priors learned through 2D human generators and 3D reconstructors.
arXiv Detail & Related papers (2023-02-02T15:37:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.