MVHuman: Tailoring 2D Diffusion with Multi-view Sampling For Realistic
3D Human Generation
- URL: http://arxiv.org/abs/2312.10120v1
- Date: Fri, 15 Dec 2023 11:56:26 GMT
- Title: MVHuman: Tailoring 2D Diffusion with Multi-view Sampling For Realistic
3D Human Generation
- Authors: Suyi Jiang, Haimin Luo, Haoran Jiang, Ziyu Wang, Jingyi Yu, Lan Xu
- Abstract summary: We present an alternative scheme named MVHuman to generate human radiance fields from text guidance.
Our core is a multi-view sampling strategy to tailor the denoising processes of the pre-trained network for generating consistent multi-view images.
- Score: 45.88714821939144
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent months have witnessed rapid progress in 3D generation based on
diffusion models. Most advances require fine-tuning existing 2D Stable
Diffsuions into multi-view settings or tedious distilling operations and hence
fall short of 3D human generation due to the lack of diverse 3D human datasets.
We present an alternative scheme named MVHuman to generate human radiance
fields from text guidance, with consistent multi-view images directly sampled
from pre-trained Stable Diffsuions without any fine-tuning or distilling. Our
core is a multi-view sampling strategy to tailor the denoising processes of the
pre-trained network for generating consistent multi-view images. It encompasses
view-consistent conditioning, replacing the original noises with
``consistency-guided noises'', optimizing latent codes, as well as utilizing
cross-view attention layers. With the multi-view images through the sampling
process, we adopt geometry refinement and 3D radiance field generation followed
by a subsequent neural blending scheme for free-view rendering. Extensive
experiments demonstrate the efficacy of our method, as well as its superiority
to state-of-the-art 3D human generation methods.
Related papers
- HuGDiffusion: Generalizable Single-Image Human Rendering via 3D Gaussian Diffusion [50.02316409061741]
HuGDiffusion is a learning pipeline to achieve novel view synthesis (NVS) of human characters from single-view input images.
We aim to generate the set of 3DGS attributes via a diffusion-based framework conditioned on human priors extracted from a single image.
Our HuGDiffusion shows significant performance improvements over the state-of-the-art methods.
arXiv Detail & Related papers (2025-01-25T01:00:33Z) - DSplats: 3D Generation by Denoising Splats-Based Multiview Diffusion Models [67.50989119438508]
We introduce DSplats, a novel method that directly denoises multiview images using Gaussian-based Reconstructors to produce realistic 3D assets.
Our experiments demonstrate that DSplats not only produces high-quality, spatially consistent outputs, but also sets a new standard in single-image to 3D reconstruction.
arXiv Detail & Related papers (2024-12-11T07:32:17Z) - MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement [23.707586182294932]
Existing works in single-image human reconstruction suffer from weak generalizability due to insufficient training data or 3D inconsistencies for a lack of comprehensive multi-view knowledge.
We introduce MagicMan, a human-specific multi-view diffusion model designed to generate high-quality novel view images from a single reference image.
arXiv Detail & Related papers (2024-08-26T12:10:52Z) - MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation [54.27399121779011]
We present MVD-Fusion: a method for single-view 3D inference via generative modeling of multi-view-consistent RGB-D images.
We show that our approach can yield more accurate synthesis compared to recent state-of-the-art, including distillation-based 3D inference and prior multi-view generation methods.
arXiv Detail & Related papers (2024-04-04T17:59:57Z) - Envision3D: One Image to 3D with Anchor Views Interpolation [18.31796952040799]
We present Envision3D, a novel method for efficiently generating high-quality 3D content from a single image.
It is capable of generating high-quality 3D content in terms of texture and geometry, surpassing previous image-to-3D baseline methods.
arXiv Detail & Related papers (2024-03-13T18:46:33Z) - CAD: Photorealistic 3D Generation via Adversarial Distillation [28.07049413820128]
We propose a novel learning paradigm for 3D synthesis that utilizes pre-trained diffusion models.
Our method unlocks the generation of high-fidelity and photorealistic 3D content conditioned on a single image and prompt.
arXiv Detail & Related papers (2023-12-11T18:59:58Z) - HumanRef: Single Image to 3D Human Generation via Reference-Guided
Diffusion [53.1558345421646]
We propose HumanRef, a 3D human generation framework from a single-view input.
To ensure the generated 3D model is photorealistic and consistent with the input image, HumanRef introduces a novel method called reference-guided score distillation sampling.
Experimental results demonstrate that HumanRef outperforms state-of-the-art methods in generating 3D clothed humans.
arXiv Detail & Related papers (2023-11-28T17:06:28Z) - Sparse3D: Distilling Multiview-Consistent Diffusion for Object
Reconstruction from Sparse Views [47.215089338101066]
We present Sparse3D, a novel 3D reconstruction method tailored for sparse view inputs.
Our approach distills robust priors from a multiview-consistent diffusion model to refine a neural radiance field.
By tapping into 2D priors from powerful image diffusion models, our integrated model consistently delivers high-quality results.
arXiv Detail & Related papers (2023-08-27T11:52:00Z) - Deceptive-NeRF/3DGS: Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction [60.52716381465063]
We introduce Deceptive-NeRF/3DGS to enhance sparse-view reconstruction with only a limited set of input images.
Specifically, we propose a deceptive diffusion model turning noisy images rendered from few-view reconstructions into high-quality pseudo-observations.
Our system progressively incorporates diffusion-generated pseudo-observations into the training image sets, ultimately densifying the sparse input observations by 5 to 10 times.
arXiv Detail & Related papers (2023-05-24T14:00:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.