Related papers: Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation

Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation

URL: http://arxiv.org/abs/2411.14384v2
Date: Tue, 26 Nov 2024 04:06:32 GMT
Title: Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation
Authors: Yuanhao Cai, He Zhang, Kai Zhang, Yixun Liang, Mengwei Ren, Fujun Luan, Qing Liu, Soo Ye Kim, Jianming Zhang, Zhifei Zhang, Yuqian Zhou, Zhe Lin, Alan Yuille,
Abstract summary: We propose a novel single-stage 3D diffusion model, DiffusionGS, for object and scene generation from a single view. Experiments show that our method enjoys better generation quality (2.20 dB higher in PSNR and 23.25 lower in FID) and over 5x faster speed (6s on an A100 GPU) than SOTA methods.
Score: 45.95218923564575
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing feed-forward image-to-3D methods mainly rely on 2D multi-view diffusion models that cannot guarantee 3D consistency. These methods easily collapse when changing the prompt view direction and mainly handle object-centric prompt images. In this paper, we propose a novel single-stage 3D diffusion model, DiffusionGS, for object and scene generation from a single view. DiffusionGS directly outputs 3D Gaussian point clouds at each timestep to enforce view consistency and allow the model to generate robustly given prompt views of any directions, beyond object-centric inputs. Plus, to improve the capability and generalization ability of DiffusionGS, we scale up 3D training data by developing a scene-object mixed training strategy. Experiments show that our method enjoys better generation quality (2.20 dB higher in PSNR and 23.25 lower in FID) and over 5x faster speed (~6s on an A100 GPU) than SOTA methods. The user study and text-to-3D applications also reveals the practical values of our method. Our Project page at https://caiyuanhao1998.github.io/project/DiffusionGS/ shows the video and interactive generation results.

Related papers

SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images [49.7344030427291]
We study the problem of single-image 3D object reconstruction. Recent works have diverged into two directions: regression-based modeling and generative modeling. We present SPAR3D, a novel two-stage approach aiming to take the best of both directions.
arXiv Detail & Related papers (2025-01-08T18:52:03Z)
GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction [52.04103235260539]
We present a diffusion model approach based on Gaussian Splatting representation for 3D object reconstruction from a single view. The model learns to generate 3D objects represented by sets of GS ellipsoids. The final reconstructed objects explicitly come with high-quality 3D structure and texture, and can be efficiently rendered in arbitrary views.
arXiv Detail & Related papers (2024-07-05T03:43:08Z)
Sampling 3D Gaussian Scenes in Seconds with Latent Diffusion Models [3.9373541926236766]
We present a latent diffusion model over 3D scenes, that can be trained using only 2D image data. We show that our approach enables generating 3D scenes in as little as 0.2 seconds, either from scratch, or from sparse input views.
arXiv Detail & Related papers (2024-06-18T23:14:29Z)
Sp2360: Sparse-view 360 Scene Reconstruction using Cascaded 2D Diffusion Priors [51.36238367193988]
We tackle sparse-view reconstruction of a 360 3D scene using priors from latent diffusion models (LDM) We present SparseSplat360, a method that employs a cascade of in-painting and artifact removal models to fill in missing details and clean novel views. Our method generates entire 360 scenes from as few as 9 input views, with a high degree of foreground and background detail.
arXiv Detail & Related papers (2024-05-26T11:01:39Z)
Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior [57.986512832738704]
We present a new framework Sculpt3D that equips the current pipeline with explicit injection of 3D priors from retrieved reference objects without re-training the 2D diffusion model. Specifically, we demonstrate that high-quality and diverse 3D geometry can be guaranteed by keypoints supervision through a sparse ray sampling approach. These two decoupled designs effectively harness 3D information from reference objects to generate 3D objects while preserving the generation quality of the 2D diffusion model.
arXiv Detail & Related papers (2024-03-14T07:39:59Z)
V3D: Video Diffusion Models are Effective 3D Generators [19.33837029942662]
We introduce V3D, which leverages the world simulation capacity of pre-trained video diffusion models to facilitate 3D generation. Benefiting from this, the state-of-the-art video diffusion model could be fine-tuned to generate 360degree orbit frames surrounding an object given a single image. Our method can be extended to scene-level novel view synthesis, achieving precise control over the camera path with sparse input views.
arXiv Detail & Related papers (2024-03-11T14:03:36Z)
One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion [32.29687304798145]
One-2-3-45++ is an innovative method that transforms a single image into a detailed 3D textured mesh in approximately one minute. Our approach aims to fully harness the extensive knowledge embedded in 2D diffusion models and priors from valuable yet limited 3D data.
arXiv Detail & Related papers (2023-11-14T03:40:25Z)
Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model [68.98311213582949]
We propose Instant3D, a novel method that generates high-quality and diverse 3D assets from text prompts in a feed-forward manner. Our method can generate diverse 3D assets of high visual quality within 20 seconds, two orders of magnitude faster than previous optimization-based methods.
arXiv Detail & Related papers (2023-11-10T18:03:44Z)
GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models [102.22388340738536]
2D and 3D diffusion models can generate decent 3D objects based on prompts. 3D diffusion models have good 3D consistency, but their quality and generalization are limited as trainable 3D data is expensive and hard to obtain. This paper attempts to bridge the power from the two types of diffusion models via the recent explicit and efficient 3D Gaussian splatting representation.
arXiv Detail & Related papers (2023-10-12T17:22:24Z)
Sparse3D: Distilling Multiview-Consistent Diffusion for Object Reconstruction from Sparse Views [47.215089338101066]
We present Sparse3D, a novel 3D reconstruction method tailored for sparse view inputs. Our approach distills robust priors from a multiview-consistent diffusion model to refine a neural radiance field. By tapping into 2D priors from powerful image diffusion models, our integrated model consistently delivers high-quality results.
arXiv Detail & Related papers (2023-08-27T11:52:00Z)
Control3Diff: Learning Controllable 3D Diffusion Models from Single-view Images [70.17085345196583]
Control3Diff is a 3D diffusion model that combines the strengths of diffusion models and 3D GANs for versatile, controllable 3D-aware image synthesis. We validate the efficacy of Control3Diff on standard image generation benchmarks, including FFHQ, AFHQ, and ShapeNet.
arXiv Detail & Related papers (2023-04-13T17:52:29Z)
3D-aware Image Generation using 2D Diffusion Models [23.150456832947427]
We formulate the 3D-aware image generation task as multiview 2D image set generation, and further to a sequential unconditional-conditional multiview image generation process. We utilize 2D diffusion models to boost the generative modeling power of the method. We train our method on a large-scale dataset, i.e., ImageNet, which is not addressed by previous methods.
arXiv Detail & Related papers (2023-03-31T09:03:18Z)
RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation [68.06991943974195]
We present RenderDiffusion, the first diffusion model for 3D generation and inference, trained using only monocular 2D supervision. We evaluate RenderDiffusion on FFHQ, AFHQ, ShapeNet and CLEVR datasets, showing competitive performance for generation of 3D scenes and inference of 3D scenes from 2D images.
arXiv Detail & Related papers (2022-11-17T20:17:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.