Related papers: DiffSSC: Semantic LiDAR Scan Completion using Denoising Diffusion Probabilistic Models

DiffSSC: Semantic LiDAR Scan Completion using Denoising Diffusion Probabilistic Models

URL: http://arxiv.org/abs/2409.18092v2
Date: Mon, 30 Sep 2024 18:14:02 GMT
Title: DiffSSC: Semantic LiDAR Scan Completion using Denoising Diffusion Probabilistic Models
Authors: Helin Cao, Sven Behnke,
Abstract summary: 3D LiDAR sensors are widely used to capture sparse point clouds of the vehicle's surroundings. Such systems struggle to perceive occluded areas and gaps in the scene due to the sparsity of these point clouds and their lack of semantics. We jointly predict unobserved geometry and semantics in the scene given raw LiDAR measurements, aiming for a more complete scene representation.
Score: 18.342569823885864
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Perception systems play a crucial role in autonomous driving, incorporating multiple sensors and corresponding computer vision algorithms. 3D LiDAR sensors are widely used to capture sparse point clouds of the vehicle's surroundings. However, such systems struggle to perceive occluded areas and gaps in the scene due to the sparsity of these point clouds and their lack of semantics. To address these challenges, Semantic Scene Completion (SSC) jointly predicts unobserved geometry and semantics in the scene given raw LiDAR measurements, aiming for a more complete scene representation. Building on promising results of diffusion models in image generation and super-resolution tasks, we propose their extension to SSC by implementing the noising and denoising diffusion processes in the point and semantic spaces individually. To control the generation, we employ semantic LiDAR point clouds as conditional input and design local and global regularization losses to stabilize the denoising process. We evaluate our approach on autonomous driving datasets and our approach outperforms the state-of-the-art for SSC.

Related papers

SWA-SOP: Spatially-aware Window Attention for Semantic Occupancy Prediction in Autonomous Driving [16.320467417627277]
Spatially-aware Window Attention (SWA) is a novel mechanism that incorporates local spatial context into attention.<n>SWA significantly improves scene completion and achieves state-of-the-art results on LiDAR-based SOP benchmarks.<n>We further validate its generality by integrating SWA into a camera-based SOP pipeline.
arXiv Detail & Related papers (2025-06-23T15:54:28Z)
A Diffusion-Based Framework for Terrain-Aware Remote Sensing Image Reconstruction [4.824120664293887]
SatelliteMaker is a diffusion-based method that reconstructs missing data across varying levels of data loss. Digital Elevation Model (DEM) as a conditioning input and use tailored prompts to generate realistic images. VGG-Adapter module based on Distribution Loss, which reduces distribution discrepancy and ensures style consistency.
arXiv Detail & Related papers (2025-04-16T14:19:57Z)
FreSca: Scaling in Frequency Space Enhances Diffusion Models [55.75504192166779]
This paper explores frequency-based control within latent diffusion models.<n>We introduce FreSca, a novel framework that decomposes noise difference into low- and high-frequency components.<n>FreSca operates without any model retraining or architectural change, offering model- and task-agnostic control.
arXiv Detail & Related papers (2025-04-02T22:03:11Z)
PFSD: A Multi-Modal Pedestrian-Focus Scene Dataset for Rich Tasks in Semi-Structured Environments [73.80718037070773]
We present the multi-modal Pedestrian-Focused Scene dataset, rigorously annotated in semi-structured scenes with the format of nuScenes. We also propose a novel Hybrid Multi-Scale Fusion Network (HMFN) to detect pedestrians in densely populated and occluded scenarios.
arXiv Detail & Related papers (2025-02-21T09:57:53Z)
Multi-Modality Driven LoRA for Adverse Condition Depth Estimation [61.525312117638116]
We propose Multi-Modality Driven LoRA (MMD-LoRA) for Adverse Condition Depth Estimation. It consists of two core components: Prompt Driven Domain Alignment (PDDA) and Visual-Text Consistent Contrastive Learning (VTCCL) It achieves state-of-the-art performance on the nuScenes and Oxford RobotCar datasets.
arXiv Detail & Related papers (2024-12-28T14:23:58Z)
DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving [55.53171248839489]
We propose an ego-centric fully sparse paradigm, named DiFSD, for end-to-end self-driving. Specifically, DiFSD mainly consists of sparse perception, hierarchical interaction and iterative motion planner. Experiments conducted on nuScenes and Bench2Drive datasets demonstrate the superior planning performance and great efficiency of DiFSD.
arXiv Detail & Related papers (2024-09-15T15:55:24Z)
OPUS: Occupancy Prediction Using a Sparse Set [64.60854562502523]
We present a framework to simultaneously predict occupied locations and classes using a set of learnable queries. OPUS incorporates a suite of non-trivial strategies to enhance model performance. Our lightest model achieves superior RayIoU on the Occ3D-nuScenes dataset at near 2x FPS, while our heaviest model surpasses previous best results by 6.1 RayIoU.
arXiv Detail & Related papers (2024-09-14T07:44:22Z)
Self-supervised Multi-future Occupancy Forecasting for Autonomous Driving [45.886941596233974]
LiDAR-generated occupancy grid maps (L-OGMs) offer a robust bird's-eye view for the scene representation. Our proposed framework performs L-OGM prediction in the latent space of a generative architecture. We decode predictions using either a single-step decoder, which provides high-quality predictions in real-time, or a diffusion-based batch decoder.
arXiv Detail & Related papers (2024-07-30T18:37:59Z)
OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising [49.86409475232849]
Trajectory prediction is fundamental in computer vision and autonomous driving. Existing approaches in this field often assume precise and complete observational data. We present a novel method for out-of-sight trajectory prediction that leverages a vision-positioning technique.
arXiv Detail & Related papers (2024-04-02T18:30:29Z)
Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion [25.69896680908217]
3D LiDAR sensors are commonly used to collect sparse 3D point clouds from the scene. We propose extending diffusion models as generative models for images to achieve scene completion from a single 3D LiDAR scan. Our method can complete the scene given a single LiDAR scan as input, producing a scene with more details compared to state-of-the-art scene completion methods.
arXiv Detail & Related papers (2024-03-20T10:19:05Z)
OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision. We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range. For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z)
Regularizing Self-supervised 3D Scene Flows with Surface Awareness and Cyclic Consistency [3.124750429062221]
We introduce two new consistency losses that enlarge clusters while preventing them from spreading over distinct objects. The proposed losses are model-independent and can thus be used in a plug-and-play fashion to significantly improve the performance of existing models. We also showcase the effectiveness and generalization capability of our framework on four standard sensor-unique driving datasets.
arXiv Detail & Related papers (2023-12-12T11:00:39Z)
Layout Sequence Prediction From Noisy Mobile Modality [53.49649231056857]
Trajectory prediction plays a vital role in understanding pedestrian movement for applications such as autonomous driving and robotics. Current trajectory prediction models depend on long, complete, and accurately observed sequences from visual modalities. We propose LTrajDiff, a novel approach that treats objects obstructed or out of sight as equally important as those with fully visible trajectories.
arXiv Detail & Related papers (2023-10-09T20:32:49Z)
SatDM: Synthesizing Realistic Satellite Image with Semantic Layout Conditioning using Diffusion Models [0.0]
Denoising Diffusion Probabilistic Models (DDPMs) have demonstrated significant promise in synthesizing realistic images from semantic layouts. In this paper, a conditional DDPM model capable of taking a semantic map and generating high-quality, diverse, and correspondingly accurate satellite images is implemented. The effectiveness of our proposed model is validated using a meticulously labeled dataset introduced within the context of this study.
arXiv Detail & Related papers (2023-09-28T19:39:13Z)
Reconfigurable Voxels: A New Representation for LiDAR-Based Point Clouds [76.52448276587707]
We propose Reconfigurable Voxels, a new approach to constructing representations from 3D point clouds. Specifically, we devise a biased random walk scheme, which adaptively covers each neighborhood with a fixed number of voxels. We find that this approach effectively improves the stability of voxel features, especially for sparse regions.
arXiv Detail & Related papers (2020-04-06T15:07:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.