Related papers: SimBEV: A Synthetic Multi-Task Multi-Sensor Driving Data Generation Tool and Dataset

SimBEV: A Synthetic Multi-Task Multi-Sensor Driving Data Generation Tool and Dataset

URL: http://arxiv.org/abs/2502.01894v1
Date: Tue, 04 Feb 2025 00:00:06 GMT
Title: SimBEV: A Synthetic Multi-Task Multi-Sensor Driving Data Generation Tool and Dataset
Authors: Goodarz Mehr, Azim Eskandarian,
Abstract summary: Bird's-eye view (BEV) perception for autonomous driving has garnered significant attention in recent years.<n>We introduce SimBEV, a synthetic data generation tool that incorporates information from multiple sources to capture accurate BEV ground truth data.<n>We use SimBEV to create the SimBEV dataset, a large collection of annotated perception data from diverse driving scenarios.
Score: 101.51012770913627
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Bird's-eye view (BEV) perception for autonomous driving has garnered significant attention in recent years, in part because BEV representation facilitates the fusion of multi-sensor data. This enables a variety of perception tasks including BEV segmentation, a concise view of the environment that can be used to plan a vehicle's trajectory. However, this representation is not fully supported by existing datasets, and creation of new datasets can be a time-consuming endeavor. To address this problem, in this paper we introduce SimBEV, an extensively configurable and scalable randomized synthetic data generation tool that incorporates information from multiple sources to capture accurate BEV ground truth data, supports a comprehensive array of sensors, and enables a variety of perception tasks including BEV segmentation and 3D object detection. We use SimBEV to create the SimBEV dataset, a large collection of annotated perception data from diverse driving scenarios.

Related papers

BEV-VLM: Trajectory Planning via Unified BEV Abstraction [6.603679803036061]
This paper introduces a novel framework for trajectory planning in autonomous driving that leverages Vision-Language Models (VLMs) with Bird's-Eye View (BEV) feature maps as visual inputs.<n>Our method utilizes highly compressed and informative BEV representations, which are generated by fusing multi-modal sensor data (e.g., camera and LiDAR) and aligning them with HD Maps.<n> Experimental results on the nuScenes dataset demonstrate 44.8% improvements in planning accuracy and complete collision avoidance.
arXiv Detail & Related papers (2025-09-27T07:13:55Z)
BEVPose: Unveiling Scene Semantics through Pose-Guided Multi-Modal BEV Alignment [8.098296280937518]
We present BEVPose, a framework that integrates BEV representations from camera and lidar data, using sensor pose as a guiding supervisory signal. By leveraging pose information, we align and fuse multi-modal sensory inputs, facilitating the learning of latent BEV embeddings that capture both geometric and semantic aspects of the environment.
arXiv Detail & Related papers (2024-10-28T12:40:27Z)
BEVWorld: A Multimodal World Simulator for Autonomous Driving via Scene-Level BEV Latents [56.33989853438012]
We propose BEVWorld, a framework that transforms multimodal sensor inputs into a unified and compact Bird's Eye View latent space for holistic environment modeling. The proposed world model consists of two main components: a multi-modal tokenizer and a latent BEV sequence diffusion model.
arXiv Detail & Related papers (2024-07-08T07:26:08Z)
SimGen: Simulator-conditioned Driving Scene Generation [50.03358485083602]
We introduce a simulator-conditioned scene generation framework called SimGen.<n>SimGen learns to generate diverse driving scenes by mixing data from the simulator and the real world.<n>It achieves superior generation quality and diversity while preserving controllability based on the text prompt and the layout pulled from a simulator.
arXiv Detail & Related papers (2024-06-13T17:58:32Z)
DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception [104.87876441265593]
Camera-only Bird's Eye View (BEV) has demonstrated great potential in environment perception in a 3D space. Unsupervised domain adaptive BEV, which effective learning from various unlabelled target data, is far under-explored. We design DA-BEV, the first domain adaptive camera-only BEV framework that addresses domain adaptive BEV challenges by exploiting the complementary nature of image-view features and BEV features.
arXiv Detail & Related papers (2024-01-13T04:21:24Z)
SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-view 3D Object Detection [46.92706423094971]
We propose Semantic-Aware BEV Pooling (SA-BEVPool), which can filter out background information according to the semantic segmentation of image features. We also propose BEV-Paste, an effective data augmentation strategy that closely matches with semantic-aware BEV feature. Experiments on nuScenes show that SA-BEV achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-21T10:28:19Z)
Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe [115.31507979199564]
Learning powerful representations in bird's-eye-view (BEV) for perception tasks is trending and drawing extensive attention both from industry and academia. As sensor configurations get more complex, integrating multi-source information from different sensors and representing features in a unified view come of vital importance. The core problems for BEV perception lie in (a) how to reconstruct the lost 3D information via view transformation from perspective view to BEV; (b) how to acquire ground truth annotations in BEV grid; and (d) how to adapt and generalize algorithms as sensor configurations vary across different scenarios.
arXiv Detail & Related papers (2022-09-12T15:29:13Z)
BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers [39.253627257740085]
3D visual perception tasks, including 3D detection and map segmentation based on multi-camera images, are essential for autonomous driving systems. We present a new framework termed BEVFormer, which learns unified BEV representations with transformers to support multiple autonomous driving perception tasks. We show that BEVFormer remarkably improves the accuracy of velocity estimation and recall of objects under low visibility conditions.
arXiv Detail & Related papers (2022-03-31T17:59:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.