Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion
- URL: http://arxiv.org/abs/2412.03515v1
- Date: Wed, 04 Dec 2024 17:57:25 GMT
- Title: Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion
- Authors: Shengyuan Zhang, An Zhao, Ling Yang, Zejian Li, Chenye Meng, Haoran Xu, Tianrun Chen, AnYang Wei, Perry Pengyun GU, Lingyun Sun,
- Abstract summary: Diffusion models have been applied to 3D LiDAR scene completion due to their strong training stability and high completion quality.<n>This paper proposes a novel distillation method tailored for 3D LiDAR scene completion models, dubbed $textbfScoreLiDAR$, which achieves efficient yet high-quality scene completion.
- Score: 25.517559974601813
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models have been applied to 3D LiDAR scene completion due to their strong training stability and high completion quality. However, the slow sampling speed limits the practical application of diffusion-based scene completion models since autonomous vehicles require an efficient perception of surrounding environments. This paper proposes a novel distillation method tailored for 3D LiDAR scene completion models, dubbed $\textbf{ScoreLiDAR}$, which achieves efficient yet high-quality scene completion. ScoreLiDAR enables the distilled model to sample in significantly fewer steps after distillation. To improve completion quality, we also introduce a novel $\textbf{Structural Loss}$, which encourages the distilled model to capture the geometric structure of the 3D LiDAR scene. The loss contains a scene-wise term constraining the holistic structure and a point-wise term constraining the key landmark points and their relative configuration. Extensive experiments demonstrate that ScoreLiDAR significantly accelerates the completion time from 30.55 to 5.37 seconds per frame ($>$5$\times$) on SemanticKITTI and achieves superior performance compared to state-of-the-art 3D LiDAR scene completion models. Our code is publicly available at https://github.com/happyw1nd/ScoreLiDAR.
Related papers
- Diffusion Distillation With Direct Preference Optimization For Efficient 3D LiDAR Scene Completion [25.55163699029964]
This paper proposes Distillation-DPO, a novel diffusion distillation framework for LiDAR scene completion with preference aligment.
Our method is the first to explore adopting preference learning in distillation to the best of our knowledge and provide insights into preference-aligned distillation.
arXiv Detail & Related papers (2025-04-15T17:57:13Z) - LiHi-GS: LiDAR-Supervised Gaussian Splatting for Highway Driving Scene Reconstruction [6.428928591765432]
Gaussian Splatting (GS) facilitates real-time, rendering with an explicit 3D Gaussian representation of the scene.
GS provides faster processing and more intuitive scene editing than the implicit Neural Radiance Fields (NeRFs)
We propose a novel GS method for dynamic scene synthesis and editing with improved scene reconstruction through LiDAR supervision and support for LiDAR rendering.
arXiv Detail & Related papers (2024-12-19T22:59:55Z) - 4D Contrastive Superflows are Dense 3D Representation Learners [62.433137130087445]
We introduce SuperFlow, a novel framework designed to harness consecutive LiDAR-camera pairs for establishing pretraining objectives.
To further boost learning efficiency, we incorporate a plug-and-play view consistency module that enhances alignment of the knowledge distilled from camera views.
arXiv Detail & Related papers (2024-07-08T17:59:54Z) - Mixed Diffusion for 3D Indoor Scene Synthesis [55.94569112629208]
We present MiDiffusion, a novel mixed discrete-continuous diffusion model architecture.
We represent a scene layout by a 2D floor plan and a set of objects, each defined by its category, location, size, and orientation.
Our experimental results demonstrate that MiDiffusion substantially outperforms state-of-the-art autoregressive and diffusion models in floor-conditioned 3D scene synthesis.
arXiv Detail & Related papers (2024-05-31T17:54:52Z) - $\textit{S}^3$Gaussian: Self-Supervised Street Gaussians for Autonomous Driving [82.82048452755394]
Photorealistic 3D reconstruction of street scenes is a critical technique for developing real-world simulators for autonomous driving.
Most existing street 3DGS methods require tracked 3D vehicle bounding boxes to decompose the static and dynamic elements.
We propose a self-supervised street Gaussian ($textitS3$Gaussian) method to decompose dynamic and static elements from 4D consistency.
arXiv Detail & Related papers (2024-05-30T17:57:08Z) - Towards Realistic Scene Generation with LiDAR Diffusion Models [15.487070964070165]
Diffusion models (DMs) excel in photo-realistic image synthesis, but their adaptation to LiDAR scene generation poses a substantial hurdle.
We propose LiDAR Diffusion Models (LiDMs) to generate LiDAR-realistic scenes from a latent space tailored to capture the realism of LiDAR scenes.
Specifically, we introduce curve-wise compression to simulate real-world LiDAR patterns, point-wise coordinate supervision to learn scene geometry, and patch-wise encoding for a full 3D object context.
arXiv Detail & Related papers (2024-03-31T22:18:56Z) - SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior [53.52396082006044]
Current methods struggle to maintain rendering quality at the viewpoint that deviates significantly from the training viewpoints.
This issue stems from the sparse training views captured by a fixed camera on a moving vehicle.
We propose a novel approach that enhances the capacity of 3DGS by leveraging prior from a Diffusion Model.
arXiv Detail & Related papers (2024-03-29T09:20:29Z) - Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion [25.69896680908217]
3D LiDAR sensors are commonly used to collect sparse 3D point clouds from the scene.
We propose extending diffusion models as generative models for images to achieve scene completion from a single 3D LiDAR scan.
Our method can complete the scene given a single LiDAR scan as input, producing a scene with more details compared to state-of-the-art scene completion methods.
arXiv Detail & Related papers (2024-03-20T10:19:05Z) - PointSeg: A Training-Free Paradigm for 3D Scene Segmentation via Foundation Models [51.24979014650188]
We present PointSeg, a training-free paradigm that leverages off-the-shelf vision foundation models to address 3D scene perception tasks.
PointSeg can segment anything in 3D scene by acquiring accurate 3D prompts to align their corresponding pixels across frames.
Our approach significantly surpasses the state-of-the-art specialist training-free model by 14.1$%$, 12.3$%$, and 12.6$%$ mAP on ScanNet, ScanNet++, and KITTI-360 datasets.
arXiv Detail & Related papers (2024-03-11T03:28:20Z) - PC-NeRF: Parent-Child Neural Radiance Fields Using Sparse LiDAR Frames
in Autonomous Driving Environments [3.1969023045814753]
We propose a 3D scene reconstruction and novel view synthesis framework called parent-child neural radiance field (PC-NeRF)
PC-NeRF implements hierarchical spatial partitioning and multi-level scene representation, including scene, segment, and point levels.
With extensive experiments, PC-NeRF is proven to achieve high-precision novel LiDAR view synthesis and 3D reconstruction in large-scale scenes.
arXiv Detail & Related papers (2024-02-14T17:16:39Z) - Pyramid Diffusion for Fine 3D Large Scene Generation [56.00726092690535]
Diffusion models have shown remarkable results in generating 2D images and small-scale 3D objects.
Their application to the synthesis of large-scale 3D scenes has been rarely explored.
We introduce a framework, the Pyramid Discrete Diffusion model (PDD), which employs scale-varied diffusion models to progressively generate high-quality outdoor scenes.
arXiv Detail & Related papers (2023-11-20T11:24:21Z) - Weakly Supervised 3D Object Detection with Multi-Stage Generalization [62.96670547848691]
We introduce BA$2$-Det, encompassing pseudo label generation and multi-stage generalization.
We develop three stages of generalization: progressing from complete to partial, static to dynamic, and close to distant.
BA$2$-Det can achieve a 20% relative improvement on the KITTI dataset.
arXiv Detail & Related papers (2023-06-08T17:58:57Z) - LODE: Locally Conditioned Eikonal Implicit Scene Completion from Sparse
LiDAR [5.900616958195897]
Scene completion refers to obtaining dense scene representation from an incomplete perception of complex 3D scenes.
Recent advances show that implicit representation learning can be leveraged for continuous scene completion.
We propose a novel Eikonal formulation that conditions the implicit representation on localized shape priors which function as dense boundary value constraints.
arXiv Detail & Related papers (2023-02-27T18:59:58Z) - Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution [34.713667358316286]
Self-driving cars need to understand 3D scenes efficiently and accurately in order to drive safely.
Existing 3D perception models are not able to recognize small instances very well due to the low-resolution voxelization and aggressive downsampling.
We propose Sparse Point-Voxel Convolution (SPVConv), a lightweight 3D module that equips the vanilla Sparse Convolution with the high-resolution point-based branch.
arXiv Detail & Related papers (2020-07-31T14:27:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.