SWA-SOP: Spatially-aware Window Attention for Semantic Occupancy Prediction in Autonomous Driving
- URL: http://arxiv.org/abs/2506.18785v1
- Date: Mon, 23 Jun 2025 15:54:28 GMT
- Title: SWA-SOP: Spatially-aware Window Attention for Semantic Occupancy Prediction in Autonomous Driving
- Authors: Helin Cao, Rafael Materla, Sven Behnke,
- Abstract summary: Spatially-aware Window Attention (SWA) is a novel mechanism that incorporates local spatial context into attention.<n>SWA significantly improves scene completion and achieves state-of-the-art results on LiDAR-based SOP benchmarks.<n>We further validate its generality by integrating SWA into a camera-based SOP pipeline.
- Score: 16.320467417627277
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Perception systems in autonomous driving rely on sensors such as LiDAR and cameras to perceive the 3D environment. However, due to occlusions and data sparsity, these sensors often fail to capture complete information. Semantic Occupancy Prediction (SOP) addresses this challenge by inferring both occupancy and semantics of unobserved regions. Existing transformer-based SOP methods lack explicit modeling of spatial structure in attention computation, resulting in limited geometric awareness and poor performance in sparse or occluded areas. To this end, we propose Spatially-aware Window Attention (SWA), a novel mechanism that incorporates local spatial context into attention. SWA significantly improves scene completion and achieves state-of-the-art results on LiDAR-based SOP benchmarks. We further validate its generality by integrating SWA into a camera-based SOP pipeline, where it also yields consistent gains across modalities.
Related papers
- Backscatter Device-aided Integrated Sensing and Communication: A Pareto Optimization Framework [59.30060797118097]
Integrated sensing and communication (ISAC) systems potentially encounter significant performance degradation in densely obstructed urban non-line-of-sight scenarios.<n>This paper proposes a backscatter approximation (BD)-assisted ISAC system, which leverages passive BDs naturally distributed in environments of enhancement.
arXiv Detail & Related papers (2025-07-12T17:11:06Z) - A Synthetic Benchmark for Collaborative 3D Semantic Occupancy Prediction in V2X Autonomous Driving [3.6538681992157604]
3D semantic occupancy prediction is an emerging perception paradigm in autonomous driving.<n>We augment an existing collaborative perception dataset by replaying it in CARLA with a high-resolution semantic voxel sensor.<n>We develop a baseline model that performs inter-agent feature fusion via spatial alignment and attention aggregation.
arXiv Detail & Related papers (2025-06-20T13:58:10Z) - DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving [62.62464518137153]
DriveTransformer is a simplified E2E-AD framework for the ease of scaling up.<n>It is composed of three unified operations: task self-attention, sensor cross-attention, temporal cross-attention.<n>It achieves state-of-the-art performance in both simulated closed-loop benchmark Bench2Drive and real world open-loop benchmark nuScenes with high FPS.
arXiv Detail & Related papers (2025-03-07T11:41:18Z) - EDENet: Echo Direction Encoding Network for Place Recognition Based on Ground Penetrating Radar [20.860547422960796]
Ground penetrating radar (GPR) based localization has gained significant recognition in robotics.<n>Existing methods are primarily focused on small-scale place recognition (PR), leaving the challenges of PR in large-scale maps unaddressed.<n>We introduce learnable Gabor filters for the precise extraction of directional responses, coupled with a direction-aware attention mechanism for effective geometric encoding.
arXiv Detail & Related papers (2025-02-28T01:48:12Z) - DiffSSC: Semantic LiDAR Scan Completion using Denoising Diffusion Probabilistic Models [18.342569823885864]
3D LiDAR sensors are widely used to capture sparse point clouds of the vehicle's surroundings.
Such systems struggle to perceive occluded areas and gaps in the scene due to the sparsity of these point clouds and their lack of semantics.
We jointly predict unobserved geometry and semantics in the scene given raw LiDAR measurements, aiming for a more complete scene representation.
arXiv Detail & Related papers (2024-09-26T17:39:05Z) - OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising [49.86409475232849]
Trajectory prediction is fundamental in computer vision and autonomous driving.
Existing approaches in this field often assume precise and complete observational data.
We present a novel method for out-of-sight trajectory prediction that leverages a vision-positioning technique.
arXiv Detail & Related papers (2024-04-02T18:30:29Z) - Efficient Real-time Smoke Filtration with 3D LiDAR for Search and Rescue
with Autonomous Heterogeneous Robotic Systems [56.838297900091426]
Smoke and dust affect the performance of any mobile robotic platform due to their reliance on onboard perception systems.
This paper proposes a novel modular computation filtration pipeline based on intensity and spatial information.
arXiv Detail & Related papers (2023-08-14T16:48:57Z) - OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic
Occupancy Perception [73.05425657479704]
We propose OpenOccupancy, which is the first surrounding semantic occupancy perception benchmark.
We extend the large-scale nuScenes dataset with dense semantic occupancy annotations.
Considering the complexity of surrounding occupancy perception, we propose the Cascade Occupancy Network (CONet) to refine the coarse prediction.
arXiv Detail & Related papers (2023-03-07T15:43:39Z) - USC: Uncompromising Spatial Constraints for Safety-Oriented 3D Object Detectors in Autonomous Driving [7.355977594790584]
We consider the safety-oriented performance of 3D object detectors in autonomous driving contexts.<n>We present uncompromising spatial constraints (USC), which characterize a simple yet important localization requirement.<n>We incorporate the quantitative measures into common loss functions to enable safety-oriented fine-tuning for existing models.
arXiv Detail & Related papers (2022-09-21T14:03:08Z) - Large-scale Autonomous Flight with Real-time Semantic SLAM under Dense
Forest Canopy [48.51396198176273]
We propose an integrated system that can perform large-scale autonomous flights and real-time semantic mapping in challenging under-canopy environments.
We detect and model tree trunks and ground planes from LiDAR data, which are associated across scans and used to constrain robot poses as well as tree trunk models.
A drift-compensation mechanism is designed to minimize the odometry drift using semantic SLAM outputs in real time, while maintaining planner optimality and controller stability.
arXiv Detail & Related papers (2021-09-14T07:24:53Z) - Towards robust sensing for Autonomous Vehicles: An adversarial
perspective [82.83630604517249]
It is of primary importance that the resulting decisions are robust to perturbations.
Adversarial perturbations are purposefully crafted alterations of the environment or of the sensory measurements.
A careful evaluation of the vulnerabilities of their sensing system(s) is necessary in order to build and deploy safer systems.
arXiv Detail & Related papers (2020-07-14T05:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.