SimBEV: A Synthetic Multi-Task Multi-Sensor Driving Data Generation Tool and Dataset
- URL: http://arxiv.org/abs/2502.01894v2
- Date: Wed, 26 Mar 2025 20:42:44 GMT
- Title: SimBEV: A Synthetic Multi-Task Multi-Sensor Driving Data Generation Tool and Dataset
- Authors: Goodarz Mehr, Azim Eskandarian,
- Abstract summary: Bird's-eye view (BEV) perception has garnered significant attention in autonomous driving in recent years.<n>SimBEV is a randomized synthetic data generation tool that is extensively scalable and scalable.<n>SimBEV is used to create the SimBEV dataset, a large collection of annotated perception data from diverse driving scenarios.
- Score: 101.51012770913627
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bird's-eye view (BEV) perception has garnered significant attention in autonomous driving in recent years, in part because BEV representation facilitates multi-modal sensor fusion. BEV representation enables a variety of perception tasks including BEV segmentation, a concise view of the environment useful for planning a vehicle's trajectory. However, this representation is not fully supported by existing datasets, and creation of new datasets for this purpose can be a time-consuming endeavor. To address this challenge, we introduce SimBEV. SimBEV is a randomized synthetic data generation tool that is extensively configurable and scalable, supports a wide array of sensors, incorporates information from multiple sources to capture accurate BEV ground truth, and enables a variety of perception tasks including BEV segmentation and 3D object detection. SimBEV is used to create the SimBEV dataset, a large collection of annotated perception data from diverse driving scenarios. SimBEV and the SimBEV dataset are open and available to the public.
Related papers
- BEVPose: Unveiling Scene Semantics through Pose-Guided Multi-Modal BEV Alignment [8.098296280937518]
We present BEVPose, a framework that integrates BEV representations from camera and lidar data, using sensor pose as a guiding supervisory signal.
By leveraging pose information, we align and fuse multi-modal sensory inputs, facilitating the learning of latent BEV embeddings that capture both geometric and semantic aspects of the environment.
arXiv Detail & Related papers (2024-10-28T12:40:27Z) - BEVWorld: A Multimodal World Simulator for Autonomous Driving via Scene-Level BEV Latents [56.33989853438012]
We propose BEVWorld, a framework that transforms multimodal sensor inputs into a unified and compact Bird's Eye View latent space for holistic environment modeling.
The proposed world model consists of two main components: a multi-modal tokenizer and a latent BEV sequence diffusion model.
arXiv Detail & Related papers (2024-07-08T07:26:08Z) - SimGen: Simulator-conditioned Driving Scene Generation [50.03358485083602]
We introduce a simulator-conditioned scene generation framework called SimGen.<n>SimGen learns to generate diverse driving scenes by mixing data from the simulator and the real world.<n>It achieves superior generation quality and diversity while preserving controllability based on the text prompt and the layout pulled from a simulator.
arXiv Detail & Related papers (2024-06-13T17:58:32Z) - DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception [104.87876441265593]
Camera-only Bird's Eye View (BEV) has demonstrated great potential in environment perception in a 3D space.
Unsupervised domain adaptive BEV, which effective learning from various unlabelled target data, is far under-explored.
We design DA-BEV, the first domain adaptive camera-only BEV framework that addresses domain adaptive BEV challenges by exploiting the complementary nature of image-view features and BEV features.
arXiv Detail & Related papers (2024-01-13T04:21:24Z) - SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-view
3D Object Detection [46.92706423094971]
We propose Semantic-Aware BEV Pooling (SA-BEVPool), which can filter out background information according to the semantic segmentation of image features.
We also propose BEV-Paste, an effective data augmentation strategy that closely matches with semantic-aware BEV feature.
Experiments on nuScenes show that SA-BEV achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-21T10:28:19Z) - Delving into the Devils of Bird's-eye-view Perception: A Review,
Evaluation and Recipe [115.31507979199564]
Learning powerful representations in bird's-eye-view (BEV) for perception tasks is trending and drawing extensive attention both from industry and academia.
As sensor configurations get more complex, integrating multi-source information from different sensors and representing features in a unified view come of vital importance.
The core problems for BEV perception lie in (a) how to reconstruct the lost 3D information via view transformation from perspective view to BEV; (b) how to acquire ground truth annotations in BEV grid; and (d) how to adapt and generalize algorithms as sensor configurations vary across different scenarios.
arXiv Detail & Related papers (2022-09-12T15:29:13Z) - BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera
Images via Spatiotemporal Transformers [39.253627257740085]
3D visual perception tasks, including 3D detection and map segmentation based on multi-camera images, are essential for autonomous driving systems.
We present a new framework termed BEVFormer, which learns unified BEV representations with transformers to support multiple autonomous driving perception tasks.
We show that BEVFormer remarkably improves the accuracy of velocity estimation and recall of objects under low visibility conditions.
arXiv Detail & Related papers (2022-03-31T17:59:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.