Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles
- URL: http://arxiv.org/abs/2402.07635v2
- Date: Thu, 25 Apr 2024 08:15:56 GMT
- Title: Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles
- Authors: Rui Song, Chenwei Liang, Hu Cao, Zhiran Yan, Walter Zimmer, Markus Gross, Andreas Festag, Alois Knoll,
- Abstract summary: We introduce the first method for collaborative 3D semantic occupancy prediction.
It improves local 3D semantic occupancy predictions by hybrid fusion of semantic and occupancy task features.
Our models anchor on semantic occupancy outpace state-of-the-art collaborative 3D detection techniques in subsequent perception applications.
- Score: 13.167432547990487
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Collaborative perception in automated vehicles leverages the exchange of information between agents, aiming to elevate perception results. Previous camera-based collaborative 3D perception methods typically employ 3D bounding boxes or bird's eye views as representations of the environment. However, these approaches fall short in offering a comprehensive 3D environmental prediction. To bridge this gap, we introduce the first method for collaborative 3D semantic occupancy prediction. Particularly, it improves local 3D semantic occupancy predictions by hybrid fusion of (i) semantic and occupancy task features, and (ii) compressed orthogonal attention features shared between vehicles. Additionally, due to the lack of a collaborative perception dataset designed for semantic occupancy prediction, we augment a current collaborative perception dataset to include 3D collaborative semantic occupancy labels for a more robust evaluation. The experimental findings highlight that: (i) our collaborative semantic occupancy predictions excel above the results from single vehicles by over 30%, and (ii) models anchored on semantic occupancy outpace state-of-the-art collaborative 3D detection techniques in subsequent perception applications, showcasing enhanced accuracy and enriched semantic-awareness in road environments.
Related papers
- A Synthetic Benchmark for Collaborative 3D Semantic Occupancy Prediction in V2X Autonomous Driving [3.6538681992157604]
3D semantic occupancy prediction is an emerging perception paradigm in autonomous driving.<n>We augment an existing collaborative perception dataset by replaying it in CARLA with a high-resolution semantic voxel sensor.<n>We develop a baseline model that performs inter-agent feature fusion via spatial alignment and attention aggregation.
arXiv Detail & Related papers (2025-06-20T13:58:10Z) - Diffusion-Based Generative Models for 3D Occupancy Prediction in Autonomous Driving [27.94544631535978]
generative models learn the underlying data distribution and incorporate 3D scene priors.<n>Our experiments show that diffusion-based generative models outperform state-of-the-art discriminative approaches.
arXiv Detail & Related papers (2025-05-29T05:34:22Z) - IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments [56.85804719947]
We present IAAO, a framework that builds an explicit 3D model for intelligent agents to gain understanding of articulated objects in their environment through interaction.
We first build hierarchical features and label fields for each object state using 3D Gaussian Splatting (3DGS) by distilling mask features and view-consistent labels from multi-view images.
We then perform object- and part-level queries on the 3D Gaussian primitives to identify static and articulated elements, estimating global transformations and local articulation parameters along with affordances.
arXiv Detail & Related papers (2025-04-09T12:36:48Z) - TGP: Two-modal occupancy prediction with 3D Gaussian and sparse points for 3D Environment Awareness [13.68631587423815]
3D semantic occupancy has rapidly become a research focus in the fields of robotics and autonomous driving environment perception.
Existing occupancy prediction tasks are modeled using voxel or point cloud-based approaches.
We propose a dual-modal prediction method based on 3D Gaussian sets and sparse points, which balances both spatial location and volumetric structural information.
arXiv Detail & Related papers (2025-03-13T01:35:04Z) - ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction [89.89610257714006]
Existing methods prioritize higher accuracy to cater to the demands of these tasks.
We introduce a series of targeted improvements for 3D semantic occupancy prediction and flow estimation.
Our purelytemporalal architecture framework, named ALOcc, achieves an optimal tradeoff between speed and accuracy.
arXiv Detail & Related papers (2024-11-12T11:32:56Z) - WildOcc: A Benchmark for Off-Road 3D Semantic Occupancy Prediction [9.639795825672023]
Off-road environments are rich in geometric information, therefore it is suitable for 3D semantic occupancy prediction tasks.
We introduce WildOcc, the first benchmark to provide dense occupancy annotations for off-road 3D semantic occupancy prediction tasks.
A ground truth generation pipeline is proposed in this paper, which employs a coarse-to-fine reconstruction to achieve a more realistic result.
arXiv Detail & Related papers (2024-10-21T09:02:40Z) - Vision-based 3D occupancy prediction in autonomous driving: a review and outlook [19.939380586314673]
We introduce the background of vision-based 3D occupancy prediction and discuss the challenges in this task.
We conduct a comprehensive survey of the progress in vision-based 3D occupancy prediction from three aspects.
We present a summary of prevailing research trends and propose some inspiring future outlooks.
arXiv Detail & Related papers (2024-05-04T07:39:25Z) - RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - Volumetric Semantically Consistent 3D Panoptic Mapping [77.13446499924977]
We introduce an online 2D-to-3D semantic instance mapping algorithm aimed at generating semantic 3D maps suitable for autonomous agents in unstructured environments.
It introduces novel ways of integrating semantic prediction confidence during mapping, producing semantic and instance-consistent 3D regions.
The proposed method achieves accuracy superior to the state of the art on public large-scale datasets, improving on a number of widely used metrics.
arXiv Detail & Related papers (2023-09-26T08:03:10Z) - 3D Pose Nowcasting: Forecast the Future to Improve the Present [65.65178700528747]
We propose a novel vision-based system leveraging depth data to accurately establish the 3D locations of skeleton joints.
We introduce the concept of Pose Nowcasting, denoting the capability of the proposed system to enhance its current pose estimation accuracy.
The experimental evaluation is conducted on two different datasets, providing accurate and real-time performance.
arXiv Detail & Related papers (2023-08-24T16:40:47Z) - Spatio-Temporal Domain Awareness for Multi-Agent Collaborative
Perception [18.358998861454477]
Multi-agent collaborative perception as a potential application for vehicle-to-everything communication could significantly improve the performance perception of autonomous vehicles over single-agent perception.
We propose SCOPE, a novel collaborative perception framework that aggregates awareness characteristics across agents in an end-to-end manner.
arXiv Detail & Related papers (2023-07-26T03:00:31Z) - A Simple Framework for 3D Occupancy Estimation in Autonomous Driving [16.605853706182696]
We present a CNN-based framework designed to reveal several key factors for 3D occupancy estimation.
We also explore the relationship between 3D occupancy estimation and other related tasks, such as monocular depth estimation and 3D reconstruction.
arXiv Detail & Related papers (2023-03-17T15:57:14Z) - Non-Local Latent Relation Distillation for Self-Adaptive 3D Human Pose
Estimation [63.199549837604444]
3D human pose estimation approaches leverage different forms of strong (2D/3D pose) or weak (multi-view or depth) paired supervision.
We cast 3D pose learning as a self-supervised adaptation problem that aims to transfer the task knowledge from a labeled source domain to a completely unpaired target.
We evaluate different self-adaptation settings and demonstrate state-of-the-art 3D human pose estimation performance on standard benchmarks.
arXiv Detail & Related papers (2022-04-05T03:52:57Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.