Dur360BEV: A Real-world 360-degree Single Camera Dataset and Benchmark for Bird-Eye View Mapping in Autonomous Driving
- URL: http://arxiv.org/abs/2503.00675v2
- Date: Thu, 06 Mar 2025 05:59:08 GMT
- Title: Dur360BEV: A Real-world 360-degree Single Camera Dataset and Benchmark for Bird-Eye View Mapping in Autonomous Driving
- Authors: Wenke E, Chao Yuan, Li Li, Yixin Sun, Yona Falinie A. Gaus, Amir Atapour-Abarghouei, Toby P. Breckon,
- Abstract summary: Dur360BEV is an autonomous driving dataset equipped with a high-resolution 128-channel 3D LiDAR and a RTK-refined/INS system.<n>This dataset and benchmark address the challenges of Bird-Eye-View (BEV) maps using only a single spherical camera.
- Score: 16.771347109638775
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Dur360BEV, a novel spherical camera autonomous driving dataset equipped with a high-resolution 128-channel 3D LiDAR and a RTK-refined GNSS/INS system, along with a benchmark architecture designed to generate Bird-Eye-View (BEV) maps using only a single spherical camera. This dataset and benchmark address the challenges of BEV generation in autonomous driving, particularly by reducing hardware complexity through the use of a single 360-degree camera instead of multiple perspective cameras. Within our benchmark architecture, we propose a novel spherical-image-to-BEV module that leverages spherical imagery and a refined sampling strategy to project features from 2D to 3D. Our approach also includes an innovative application of focal loss, specifically adapted to address the extreme class imbalance often encountered in BEV segmentation tasks, that demonstrates improved segmentation performance on the Dur360BEV dataset. The results show that our benchmark not only simplifies the sensor setup but also achieves competitive performance.
Related papers
- KD360-VoxelBEV: LiDAR and 360-degree Camera Cross Modality Knowledge Distillation for Bird's-Eye-View Segmentation [30.730703237135216]
We present the first cross-modality distillation framework specifically tailored for single-panoramic-camera Bird's-Eye-View (BEV) segmentation.<n>Our approach leverages a novel LiDAR image representation fused from range, intensity and ambient channels, together with a voxel-aligned view transformer.<n>During training, a high-capacity LiDAR and camera fusion Teacher network extracts both rich spatial and semantic features for cross-modality knowledge distillation.
arXiv Detail & Related papers (2025-12-17T11:00:00Z) - MIC-BEV: Multi-Infrastructure Camera Bird's-Eye-View Transformer with Relation-Aware Fusion for 3D Object Detection [14.97413385915044]
We introduce MIC-BEV, a Transformer-based bird's-eye-view (BEV) perception framework for infrastructure-based multi-camera 3D object detection.<n>To support training and evaluation, we introduce M2I, a synthetic dataset for infrastructure-based object detection.<n>Experiments on both M2I and the real-world dataset RoScenes demonstrate that MIC-BEV achieves state-of-the-art performance in 3D object detection.
arXiv Detail & Related papers (2025-10-28T17:49:42Z) - OnlineBEV: Recurrent Temporal Fusion in Bird's Eye View Representations for Multi-Camera 3D Perception [13.143625047012604]
Multi-view camera-based 3D perception can be conducted using bird's eye view (BEV) features obtained through perspective view-to-BEV transformations.<n>OnlineBEV combines BEV features over time using a recurrent structure.<n>OnlineBEV achieves 63.9% NDS on the nuScenes test set, recording state-of-the-art performance in the camera-only 3D object detection task.
arXiv Detail & Related papers (2025-07-11T14:48:59Z) - FRAME: Floor-aligned Representation for Avatar Motion from Egocentric Video [52.33896173943054]
Egocentric motion capture with a head-mounted body-facing stereo camera is crucial for VR and AR applications.
Existing methods rely on synthetic pretraining and struggle to generate smooth and accurate predictions in real-world settings.
We propose FRAME, a simple yet effective architecture that combines device pose and camera feeds for state-of-the-art body pose prediction.
arXiv Detail & Related papers (2025-03-29T14:26:06Z) - HV-BEV: Decoupling Horizontal and Vertical Feature Sampling for Multi-View 3D Object Detection [34.72603963887331]
HV-BEV is a novel approach that decouples feature sampling in the BEV grid queries paradigm into horizontal feature aggregation and vertical adaptive height-aware reference point sampling.<n>Our best-performing model achieves a remarkable 50.5% mAP and 59.8% NDS on the nuScenes testing set.
arXiv Detail & Related papers (2024-12-25T11:49:14Z) - RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion [58.77329237533034]
We propose a Radar-Camera fusion transformer (RaCFormer) to boost the accuracy of 3D object detection.<n>RaCFormer achieves superior results of 64.9% mAP and 70.2% NDS on nuScenes, even outperforming several LiDAR-based detectors.
arXiv Detail & Related papers (2024-12-17T09:47:48Z) - OneBEV: Using One Panoramic Image for Bird's-Eye-View Semantic Mapping [25.801868221496473]
OneBEV is a novel BEV semantic mapping approach using merely a single panoramic image as input.
A distortion-aware module termed Mamba View Transformation (MVT) is specifically designed to handle the spatial distortions in panoramas.
This work advances BEV semantic mapping in autonomous driving, paving the way for more advanced and reliable autonomous systems.
arXiv Detail & Related papers (2024-09-20T21:33:53Z) - DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception [104.87876441265593]
Camera-only Bird's Eye View (BEV) has demonstrated great potential in environment perception in a 3D space.
Unsupervised domain adaptive BEV, which effective learning from various unlabelled target data, is far under-explored.
We design DA-BEV, the first domain adaptive camera-only BEV framework that addresses domain adaptive BEV challenges by exploiting the complementary nature of image-view features and BEV features.
arXiv Detail & Related papers (2024-01-13T04:21:24Z) - Instance-aware Multi-Camera 3D Object Detection with Structural Priors
Mining and Self-Boosting Learning [93.71280187657831]
Camera-based bird-eye-view (BEV) perception paradigm has made significant progress in the autonomous driving field.
We propose IA-BEV, which integrates image-plane instance awareness into the depth estimation process within a BEV-based detector.
arXiv Detail & Related papers (2023-12-13T09:24:42Z) - Towards Efficient 3D Object Detection in Bird's-Eye-View Space for Autonomous Driving: A Convolutional-Only Approach [13.513005108086006]
We propose an efficient BEV-based 3D detection framework called BEVENet.
BEVENet is 3$times$ faster than contemporary state-of-the-art (SOTA) approaches on the NuScenes challenge.
Our experiments show that BEVENet is 3$times$ faster than contemporary state-of-the-art (SOTA) approaches.
arXiv Detail & Related papers (2023-12-01T14:52:59Z) - CoBEV: Elevating Roadside 3D Object Detection with Depth and Height Complementarity [34.025530326420146]
We develop Complementary-BEV, a novel end-to-end monocular 3D object detection framework.
We conduct extensive experiments on the public 3D detection benchmarks of roadside camera-based DAIR-V2X-I and Rope3D.
For the first time, the vehicle AP score of a camera model reaches 80% on DAIR-V2X-I in terms of easy mode.
arXiv Detail & Related papers (2023-10-04T13:38:53Z) - BEVTrack: A Simple and Strong Baseline for 3D Single Object Tracking in Bird's-Eye View [56.77287041917277]
3D Single Object Tracking (SOT) is a fundamental task of computer vision, proving essential for applications like autonomous driving.
In this paper, we propose BEVTrack, a simple yet effective baseline method.
By estimating the target motion in Bird's-Eye View (BEV) to perform tracking, BEVTrack demonstrates surprising simplicity from various aspects, i.e., network designs, training objectives, and tracking pipeline, while achieving superior performance.
arXiv Detail & Related papers (2023-09-05T12:42:26Z) - BEVFusion4D: Learning LiDAR-Camera Fusion Under Bird's-Eye-View via
Cross-Modality Guidance and Temporal Aggregation [14.606324706328106]
We propose a dual-branch framework to generate LiDAR and camera BEV, then perform an adaptive modality fusion.
A LiDAR-Guided View Transformer (LGVT) is designed to effectively obtain the camera representation in BEV space.
Our framework dubbed BEVFusion4D achieves state-of-the-art results in 3D object detection.
arXiv Detail & Related papers (2023-03-30T02:18:07Z) - Fully Convolutional One-Stage 3D Object Detection on LiDAR Range Images [96.66271207089096]
FCOS-LiDAR is a fully convolutional one-stage 3D object detector for LiDAR point clouds of autonomous driving scenes.
We show that an RV-based 3D detector with standard 2D convolutions alone can achieve comparable performance to state-of-the-art BEV-based detectors.
arXiv Detail & Related papers (2022-05-27T05:42:16Z) - M^2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified
Birds-Eye View Representation [145.6041893646006]
M$2$BEV is a unified framework that jointly performs 3D object detection and map segmentation.
M$2$BEV infers both tasks with a unified model and improves efficiency.
arXiv Detail & Related papers (2022-04-11T13:43:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.