Related papers: Unlocking Past Information: Temporal Embeddings in Cooperative Bird's Eye View Prediction

Unlocking Past Information: Temporal Embeddings in Cooperative Bird's Eye View Prediction

URL: http://arxiv.org/abs/2401.14325v1
Date: Thu, 25 Jan 2024 17:21:35 GMT
Title: Unlocking Past Information: Temporal Embeddings in Cooperative Bird's Eye View Prediction
Authors: Dominik R\"o{\ss}le and Jeremias Gerner and Klaus Bogenberger and Daniel Cremers and Stefanie Schmidtner and Torsten Sch\"on
Abstract summary: This paper introduces TempCoBEV, a temporal module designed to incorporate historical cues into current observations. We show the efficacy of TempCoBEV and its capability to integrate historical cues into the current BEV map, improving predictions under optimal communication conditions by up to 2% and under communication failures by up to 19%.
Score: 34.68695222573004
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Accurate and comprehensive semantic segmentation of Bird's Eye View (BEV) is essential for ensuring safe and proactive navigation in autonomous driving. Although cooperative perception has exceeded the detection capabilities of single-agent systems, prevalent camera-based algorithms in cooperative perception neglect valuable information derived from historical observations. This limitation becomes critical during sensor failures or communication issues as cooperative perception reverts to single-agent perception, leading to degraded performance and incomplete BEV segmentation maps. This paper introduces TempCoBEV, a temporal module designed to incorporate historical cues into current observations, thereby improving the quality and reliability of BEV map segmentations. We propose an importance-guided attention architecture to effectively integrate temporal information that prioritizes relevant properties for BEV map segmentation. TempCoBEV is an independent temporal module that seamlessly integrates into state-of-the-art camera-based cooperative perception models. We demonstrate through extensive experiments on the OPV2V dataset that TempCoBEV performs better than non-temporal models in predicting current and future BEV map segmentations, particularly in scenarios involving communication failures. We show the efficacy of TempCoBEV and its capability to integrate historical cues into the current BEV map, improving predictions under optimal communication conditions by up to 2% and under communication failures by up to 19%. The code will be published on GitHub.

Related papers

BEVMOSNet: Multimodal Fusion for BEV Moving Object Segmentation [3.613463012025065]
We introduce BEVMOSNet, the first end-to-end multimodal fusion leveraging cameras, LiDAR, and radar to precisely predict the moving objects in bird's-eye-view (BEV) We show an overall improvement in IoU score of 36.59% compared to the vision-based unimodal baseline BEV-MoSeg.
arXiv Detail & Related papers (2025-03-05T09:03:46Z)
LiDAR-based End-to-end Temporal Perception for Vehicle-Infrastructure Cooperation [16.465037559349323]
We introduce LET-VIC, a LiDAR-based End-to-End Tracking framework for Vehicle-Temporal Cooperation (VIC) LET-VIC leverages Vehicle-to-Everything (V2X) communication to enhance temporal perception by fusing spatial and temporal data from both vehicle and infrastructure sensors. Experiments on the V2X-Seq-SPD dataset demonstrate that LET-VIC significantly outperforms baseline models, achieving at least a 13.7% improvement in mAP and a 13.1% improvement in AMOTA without considering communication delays.
arXiv Detail & Related papers (2024-11-22T13:34:29Z)
VQ-Map: Bird's-Eye-View Map Layout Estimation in Tokenized Discrete Space via Vector Quantization [108.68014173017583]
Bird's-eye-view (BEV) map layout estimation requires an accurate and full understanding of the semantics for the environmental elements around the ego car. We propose to utilize a generative model similar to the Vector Quantized-Variational AutoEncoder (VQ-VAE) to acquire prior knowledge for the high-level BEV semantics in the tokenized discrete space. Thanks to the obtained BEV tokens accompanied with a codebook embedding encapsulating the semantics for different BEV elements in the groundtruth maps, we are able to directly align the sparse backbone image features with the obtained BEV tokens
arXiv Detail & Related papers (2024-11-03T16:09:47Z)
OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird's-eye-view Vehicle Semantic Segmentation [57.2213693781672]
Bird's-eye-view (BEV) semantic segmentation is becoming crucial in autonomous driving systems. We propose OE-BevSeg, an end-to-end multimodal framework that enhances BEV segmentation performance. Our approach achieves state-of-the-art results by a large margin on the nuScenes dataset for vehicle segmentation.
arXiv Detail & Related papers (2024-07-18T03:48:22Z)
Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving [55.93813178692077]
We present RoboBEV, an extensive benchmark suite designed to evaluate the resilience of BEV algorithms. We assess 33 state-of-the-art BEV-based perception models spanning tasks like detection, map segmentation, depth estimation, and occupancy prediction. Our experimental results also underline the efficacy of strategies like pre-training and depth-free BEV transformations in enhancing robustness against out-of-distribution data.
arXiv Detail & Related papers (2024-05-27T17:59:39Z)
TempBEV: Improving Learned BEV Encoders with Combined Image and BEV Space Temporal Aggregation [9.723276622743473]
We develop a novel temporal BEV encoder, TempBEV, which integrates aggregated temporal information from both latent spaces. Empirical evaluation on the NuScenes dataset shows a significant improvement by TempBEV over the baseline for 3D object detection and BEV segmentation.
arXiv Detail & Related papers (2024-04-17T23:49:00Z)
BEVCar: Camera-Radar Fusion for BEV Map and Object Segmentation [22.870994478494566]
We introduce BEVCar, a novel approach for joint BEV object and map segmentation. The core novelty of our approach lies in first learning a point-based encoding of raw radar data. We show that incorporating radar information significantly enhances robustness in challenging environmental conditions.
arXiv Detail & Related papers (2024-03-18T13:14:46Z)
DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception [104.87876441265593]
Camera-only Bird's Eye View (BEV) has demonstrated great potential in environment perception in a 3D space. Unsupervised domain adaptive BEV, which effective learning from various unlabelled target data, is far under-explored. We design DA-BEV, the first domain adaptive camera-only BEV framework that addresses domain adaptive BEV challenges by exploiting the complementary nature of image-view features and BEV features.
arXiv Detail & Related papers (2024-01-13T04:21:24Z)
U-BEV: Height-aware Bird's-Eye-View Segmentation and Neural Map-based Relocalization [81.76044207714637]
Relocalization is essential for intelligent vehicles when GPS reception is insufficient or sensor-based localization fails. Recent advances in Bird's-Eye-View (BEV) segmentation allow for accurate estimation of local scene appearance. This paper presents U-BEV, a U-Net inspired architecture that extends the current state-of-the-art by allowing the BEV to reason about the scene on multiple height layers before flattening the BEV features.
arXiv Detail & Related papers (2023-10-20T18:57:38Z)
CoBEVFusion: Cooperative Perception with LiDAR-Camera Bird's-Eye View Fusion [0.0]
Recent approaches in cooperative perception only share single sensor information such as cameras or LiDAR. We present a framework, called CoBEVFusion, that fuses LiDAR and camera data to create a Bird's-Eye View (BEV) representation. Our framework was evaluated on the cooperative perception dataset OPV2V for two perception tasks: BEV semantic segmentation and 3D object detection.
arXiv Detail & Related papers (2023-10-09T17:52:26Z)
Generating Evidential BEV Maps in Continuous Driving Space [13.073542165482566]
We propose a complete probabilistic model named GevBEV. It interprets the 2D driving space as a probabilistic Bird's Eye View (BEV) map with point-based spatial Gaussian distributions. GevBEV helps reduce communication overhead by selecting only the most important information to share from the learned uncertainty.
arXiv Detail & Related papers (2023-02-06T17:05:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.