HuGDiffusion: Generalizable Single-Image Human Rendering via 3D Gaussian Diffusion
- URL: http://arxiv.org/abs/2501.15008v1
- Date: Sat, 25 Jan 2025 01:00:33 GMT
- Title: HuGDiffusion: Generalizable Single-Image Human Rendering via 3D Gaussian Diffusion
- Authors: Yingzhi Tang, Qijian Zhang, Junhui Hou,
- Abstract summary: HuGDiffusion is a learning pipeline to achieve novel view synthesis (NVS) of human characters from single-view input images.
We aim to generate the set of 3DGS attributes via a diffusion-based framework conditioned on human priors extracted from a single image.
Our HuGDiffusion shows significant performance improvements over the state-of-the-art methods.
- Score: 50.02316409061741
- License:
- Abstract: We present HuGDiffusion, a generalizable 3D Gaussian splatting (3DGS) learning pipeline to achieve novel view synthesis (NVS) of human characters from single-view input images. Existing approaches typically require monocular videos or calibrated multi-view images as inputs, whose applicability could be weakened in real-world scenarios with arbitrary and/or unknown camera poses. In this paper, we aim to generate the set of 3DGS attributes via a diffusion-based framework conditioned on human priors extracted from a single image. Specifically, we begin with carefully integrated human-centric feature extraction procedures to deduce informative conditioning signals. Based on our empirical observations that jointly learning the whole 3DGS attributes is challenging to optimize, we design a multi-stage generation strategy to obtain different types of 3DGS attributes. To facilitate the training process, we investigate constructing proxy ground-truth 3D Gaussian attributes as high-quality attribute-level supervision signals. Through extensive experiments, our HuGDiffusion shows significant performance improvements over the state-of-the-art methods. Our code will be made publicly available.
Related papers
- DSplats: 3D Generation by Denoising Splats-Based Multiview Diffusion Models [67.50989119438508]
We introduce DSplats, a novel method that directly denoises multiview images using Gaussian-based Reconstructors to produce realistic 3D assets.
Our experiments demonstrate that DSplats not only produces high-quality, spatially consistent outputs, but also sets a new standard in single-image to 3D reconstruction.
arXiv Detail & Related papers (2024-12-11T07:32:17Z) - NovelGS: Consistent Novel-view Denoising via Large Gaussian Reconstruction Model [57.92709692193132]
NovelGS is a diffusion model for Gaussian Splatting given sparse-view images.
We leverage the novel view denoising through a transformer-based network to generate 3D Gaussians.
arXiv Detail & Related papers (2024-11-25T07:57:17Z) - DiHuR: Diffusion-Guided Generalizable Human Reconstruction [51.31232435994026]
We introduce DiHuR, a Diffusion-guided model for generalizable Human 3D Reconstruction and view synthesis from sparse, minimally overlapping images.
Our method integrates two key priors in a coherent manner: the prior from generalizable feed-forward models and the 2D diffusion prior, and it requires only multi-view image training, without 3D supervision.
arXiv Detail & Related papers (2024-11-16T03:52:23Z) - EVA-Gaussian: 3D Gaussian-based Real-time Human Novel View Synthesis under Diverse Camera Settings [11.248908608011941]
EVA-Gaussian is a real-time pipeline for 3D human novel view synthesis across diverse camera settings.
We introduce an Efficient cross-View Attention (EVA) module to accurately estimate the position of each 3D Gaussian from the source images.
We incorporate a powerful anchor loss function for both 3D Gaussian attributes and human face landmarks.
arXiv Detail & Related papers (2024-10-02T11:23:08Z) - WE-GS: An In-the-wild Efficient 3D Gaussian Representation for Unconstrained Photo Collections [8.261637198675151]
Novel View Synthesis (NVS) from unconstrained photo collections is challenging in computer graphics.
We propose an efficient point-based differentiable rendering framework for scene reconstruction from photo collections.
Our approach outperforms existing approaches on the rendering quality of novel view and appearance synthesis with high converge and rendering speed.
arXiv Detail & Related papers (2024-06-04T15:17:37Z) - 3D Human Reconstruction in the Wild with Synthetic Data Using Generative Models [52.96248836582542]
We propose an effective approach based on recent diffusion models, termed HumanWild, which can effortlessly generate human images and corresponding 3D mesh annotations.
By exclusively employing generative models, we generate large-scale in-the-wild human images and high-quality annotations, eliminating the need for real-world data collection.
arXiv Detail & Related papers (2024-03-17T06:31:16Z) - MVHuman: Tailoring 2D Diffusion with Multi-view Sampling For Realistic
3D Human Generation [45.88714821939144]
We present an alternative scheme named MVHuman to generate human radiance fields from text guidance.
Our core is a multi-view sampling strategy to tailor the denoising processes of the pre-trained network for generating consistent multi-view images.
arXiv Detail & Related papers (2023-12-15T11:56:26Z) - Deceptive-NeRF/3DGS: Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction [60.52716381465063]
We introduce Deceptive-NeRF/3DGS to enhance sparse-view reconstruction with only a limited set of input images.
Specifically, we propose a deceptive diffusion model turning noisy images rendered from few-view reconstructions into high-quality pseudo-observations.
Our system progressively incorporates diffusion-generated pseudo-observations into the training image sets, ultimately densifying the sparse input observations by 5 to 10 times.
arXiv Detail & Related papers (2023-05-24T14:00:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.