Related papers: A Synthetic Benchmark for Collaborative 3D Semantic Occupancy Prediction in V2X Autonomous Driving

A Synthetic Benchmark for Collaborative 3D Semantic Occupancy Prediction in V2X Autonomous Driving

URL: http://arxiv.org/abs/2506.17004v1
Date: Fri, 20 Jun 2025 13:58:10 GMT
Title: A Synthetic Benchmark for Collaborative 3D Semantic Occupancy Prediction in V2X Autonomous Driving
Authors: Hanlin Wu, Pengfei Lin, Ehsan Javanmardi, Naren Bao, Bo Qian, Hao Si, Manabu Tsukada,
Abstract summary: 3D semantic occupancy prediction is an emerging perception paradigm in autonomous driving.<n>We augment an existing collaborative perception dataset by replaying it in CARLA with a high-resolution semantic voxel sensor.<n>We develop a baseline model that performs inter-agent feature fusion via spatial alignment and attention aggregation.
Score: 3.6538681992157604
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: 3D semantic occupancy prediction is an emerging perception paradigm in autonomous driving, providing a voxel-level representation of both geometric details and semantic categories. However, the perception capability of a single vehicle is inherently constrained by occlusion, restricted sensor range, and narrow viewpoints. To address these limitations, collaborative perception enables the exchange of complementary information, thereby enhancing the completeness and accuracy. In the absence of a dedicated dataset for collaborative 3D semantic occupancy prediction, we augment an existing collaborative perception dataset by replaying it in CARLA with a high-resolution semantic voxel sensor to provide dense and comprehensive occupancy annotations. In addition, we establish benchmarks with varying prediction ranges designed to systematically assess the impact of spatial extent on collaborative prediction. We further develop a baseline model that performs inter-agent feature fusion via spatial alignment and attention aggregation. Experimental results demonstrate that our baseline model consistently outperforms single-agent models, with increasing gains observed as the prediction range expands.

Related papers

OccLE: Label-Efficient 3D Semantic Occupancy Prediction [48.50138308129873]
3D semantic occupancy prediction offers an intuitive and efficient scene understanding.<n>Existing approaches either rely on full supervision, or on self-supervision, which provides limited guidance and yields suboptimal performance.<n>We propose OccLE, a Label-Efficient 3D Semantic Occupancy Prediction that takes images and LiDAR as inputs and maintains high performance with limited voxel annotations.
arXiv Detail & Related papers (2025-05-27T01:41:28Z)
TGP: Two-modal occupancy prediction with 3D Gaussian and sparse points for 3D Environment Awareness [13.68631587423815]
3D semantic occupancy has rapidly become a research focus in the fields of robotics and autonomous driving environment perception.<n>Existing occupancy prediction tasks are modeled using voxel or point cloud-based approaches.<n>We propose a dual-modal prediction method based on 3D Gaussian sets and sparse points, which balances both spatial location and volumetric structural information.
arXiv Detail & Related papers (2025-03-13T01:35:04Z)
OPUS: Occupancy Prediction Using a Sparse Set [64.60854562502523]
We present a framework to simultaneously predict occupied locations and classes using a set of learnable queries. OPUS incorporates a suite of non-trivial strategies to enhance model performance. Our lightest model achieves superior RayIoU on the Occ3D-nuScenes dataset at near 2x FPS, while our heaviest model surpasses previous best results by 6.1 RayIoU.
arXiv Detail & Related papers (2024-09-14T07:44:22Z)
S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR [50.435592120607815]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR) Previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection. In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z)
Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles [13.167432547990487]
We introduce the first method for collaborative 3D semantic occupancy prediction. It improves local 3D semantic occupancy predictions by hybrid fusion of semantic and occupancy task features. Our models anchor on semantic occupancy outpace state-of-the-art collaborative 3D detection techniques in subsequent perception applications.
arXiv Detail & Related papers (2024-02-12T13:19:08Z)
JRDB-Traj: A Dataset and Benchmark for Trajectory Forecasting in Crowds [79.00975648564483]
Trajectory forecasting models, employed in fields such as robotics, autonomous vehicles, and navigation, face challenges in real-world scenarios. This dataset provides comprehensive data, including the locations of all agents, scene images, and point clouds, all from the robot's perspective. The objective is to predict the future positions of agents relative to the robot using raw sensory input data.
arXiv Detail & Related papers (2023-11-05T18:59:31Z)
Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations. We derive suitable measures to quantify prediction uncertainty at both pose and joint level. We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z)
TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks. To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame. Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z)
Spatio-Temporal Graph Dual-Attention Network for Multi-Agent Prediction and Tracking [23.608125748229174]
We propose a generic generative neural system for multi-agent trajectory prediction involving heterogeneous agents. The proposed system is evaluated on three public benchmark datasets for trajectory prediction.
arXiv Detail & Related papers (2021-02-18T02:25:35Z)
Social-WaGDAT: Interaction-aware Trajectory Prediction via Wasserstein Graph Double-Attention Network [29.289670231364788]
In this paper, we propose a generic generative neural system for multi-agent trajectory prediction. We also employ an efficient kinematic constraint layer applied to vehicle trajectory prediction. The proposed system is evaluated on three public benchmark datasets for trajectory prediction.
arXiv Detail & Related papers (2020-02-14T20:11:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.