A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision
- URL: http://arxiv.org/abs/2412.00623v2
- Date: Sat, 07 Dec 2024 07:21:01 GMT
- Title: A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision
- Authors: Chensheng Peng, Ido Sobol, Masayoshi Tomizuka, Kurt Keutzer, Chenfeng Xu, Or Litany,
- Abstract summary: We introduce a diffusion model for Gaussian Splats, SplatDiffusion, to enable generation of three-dimensional structures from single images.
Existing methods rely on deterministic, feed-forward predictions, which limit their ability to handle the inherent ambiguity of 3D inference from 2D data.
- Score: 65.33043028101471
- License:
- Abstract: We introduce a diffusion model for Gaussian Splats, SplatDiffusion, to enable generation of three-dimensional structures from single images, addressing the ill-posed nature of lifting 2D inputs to 3D. Existing methods rely on deterministic, feed-forward predictions, which limit their ability to handle the inherent ambiguity of 3D inference from 2D data. Diffusion models have recently shown promise as powerful generative models for 3D data, including Gaussian splats; however, standard diffusion frameworks typically require the target signal and denoised signal to be in the same modality, which is challenging given the scarcity of 3D data. To overcome this, we propose a novel training strategy that decouples the denoised modality from the supervision modality. By using a deterministic model as a noisy teacher to create the noised signal and transitioning from single-step to multi-step denoising supervised by an image rendering loss, our approach significantly enhances performance compared to the deterministic teacher. Additionally, our method is flexible, as it can learn from various 3D Gaussian Splat (3DGS) teachers with minimal adaptation; we demonstrate this by surpassing the performance of two different deterministic models as teachers, highlighting the potential generalizability of our framework. Our approach further incorporates a guidance mechanism to aggregate information from multiple views, enhancing reconstruction quality when more than one view is available. Experimental results on object-level and scene-level datasets demonstrate the effectiveness of our framework.
Related papers
- DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation [33.62074896816882]
DiffSplat is a novel 3D generative framework that generates 3D Gaussian splats by taming large-scale text-to-image diffusion models.
It differs from previous 3D generative models by effectively utilizing web-scale 2D priors while maintaining 3D consistency in a unified model.
In conjunction with the regular diffusion loss on these grids, a 3D rendering loss is introduced to facilitate 3D coherence across arbitrary views.
arXiv Detail & Related papers (2025-01-28T07:38:59Z) - LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving [52.83707400688378]
LargeAD is a versatile and scalable framework designed for large-scale 3D pretraining across diverse real-world driving datasets.
Our framework leverages VFMs to extract semantically rich superpixels from 2D images, which are aligned with LiDAR point clouds to generate high-quality contrastive samples.
Our approach delivers significant performance improvements over state-of-the-art methods in both linear probing and fine-tuning tasks for both LiDAR-based segmentation and object detection.
arXiv Detail & Related papers (2025-01-07T18:59:59Z) - GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency [50.11520458252128]
Existing 3D affordance learning methods struggle with generalization and robustness due to limited annotated data.
We propose GEAL, a novel framework designed to enhance the generalization and robustness of 3D affordance learning by leveraging large-scale pre-trained 2D models.
GEAL consistently outperforms existing methods across seen and novel object categories, as well as corrupted data.
arXiv Detail & Related papers (2024-12-12T17:59:03Z) - Semi-supervised 3D Semantic Scene Completion with 2D Vision Foundation Model Guidance [8.07701188057789]
We introduce a novel semi-supervised framework to alleviate the dependency on densely annotated data.
Our approach leverages 2D foundation models to generate essential 3D scene geometric and semantic cues.
Our method achieves up to 85% of the fully-supervised performance using only 10% labeled data.
arXiv Detail & Related papers (2024-08-21T12:13:18Z) - Approaching Outside: Scaling Unsupervised 3D Object Detection from 2D Scene [22.297964850282177]
We propose LiDAR-2D Self-paced Learning (LiSe) for unsupervised 3D detection.
RGB images serve as a valuable complement to LiDAR data, offering precise 2D localization cues.
Our framework devises a self-paced learning pipeline that incorporates adaptive sampling and weak model aggregation strategies.
arXiv Detail & Related papers (2024-07-11T14:58:49Z) - GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction [52.04103235260539]
We present a diffusion model approach based on Gaussian Splatting representation for 3D object reconstruction from a single view.
The model learns to generate 3D objects represented by sets of GS ellipsoids.
The final reconstructed objects explicitly come with high-quality 3D structure and texture, and can be efficiently rendered in arbitrary views.
arXiv Detail & Related papers (2024-07-05T03:43:08Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - Diffusion-SS3D: Diffusion Model for Semi-supervised 3D Object Detection [77.23918785277404]
We present Diffusion-SS3D, a new perspective of enhancing the quality of pseudo-labels via the diffusion model for semi-supervised 3D object detection.
Specifically, we include noises to produce corrupted 3D object size and class label, distributions, and then utilize the diffusion model as a denoising process to obtain bounding box outputs.
We conduct experiments on the ScanNet and SUN RGB-D benchmark datasets to demonstrate that our approach achieves state-of-the-art performance against existing methods.
arXiv Detail & Related papers (2023-12-05T18:54:03Z) - Every Dataset Counts: Scaling up Monocular 3D Object Detection with Joint Datasets Training [9.272389295055271]
This study investigates the pipeline for training a monocular 3D object detection model on a diverse collection of 3D and 2D datasets.
The proposed framework comprises three components: (1) a robust monocular 3D model capable of functioning across various camera settings, (2) a selective-training strategy to accommodate datasets with differing class annotations, and (3) a pseudo 3D training approach using 2D labels to enhance detection performance in scenes containing only 2D labels.
arXiv Detail & Related papers (2023-10-02T06:17:24Z) - Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D
Generation [28.25023686484727]
A diffusion model learns to predict a vector field of gradients.
We propose a chain rule on the learned gradients, and back-propagate the score of a diffusion model through the Jacobian of a differentiable field.
We run our algorithm on several off-the-shelf diffusion image generative models, including the recently released Stable Diffusion trained on the large-scale LAION dataset.
arXiv Detail & Related papers (2022-12-01T18:56:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.