BEVPose: Unveiling Scene Semantics through Pose-Guided Multi-Modal BEV Alignment
- URL: http://arxiv.org/abs/2410.20969v1
- Date: Mon, 28 Oct 2024 12:40:27 GMT
- Title: BEVPose: Unveiling Scene Semantics through Pose-Guided Multi-Modal BEV Alignment
- Authors: Mehdi Hosseinzadeh, Ian Reid,
- Abstract summary: We present BEVPose, a framework that integrates BEV representations from camera and lidar data, using sensor pose as a guiding supervisory signal.
By leveraging pose information, we align and fuse multi-modal sensory inputs, facilitating the learning of latent BEV embeddings that capture both geometric and semantic aspects of the environment.
- Score: 8.098296280937518
- License:
- Abstract: In the field of autonomous driving and mobile robotics, there has been a significant shift in the methods used to create Bird's Eye View (BEV) representations. This shift is characterised by using transformers and learning to fuse measurements from disparate vision sensors, mainly lidar and cameras, into a 2D planar ground-based representation. However, these learning-based methods for creating such maps often rely heavily on extensive annotated data, presenting notable challenges, particularly in diverse or non-urban environments where large-scale datasets are scarce. In this work, we present BEVPose, a framework that integrates BEV representations from camera and lidar data, using sensor pose as a guiding supervisory signal. This method notably reduces the dependence on costly annotated data. By leveraging pose information, we align and fuse multi-modal sensory inputs, facilitating the learning of latent BEV embeddings that capture both geometric and semantic aspects of the environment. Our pretraining approach demonstrates promising performance in BEV map segmentation tasks, outperforming fully-supervised state-of-the-art methods, while necessitating only a minimal amount of annotated data. This development not only confronts the challenge of data efficiency in BEV representation learning but also broadens the potential for such techniques in a variety of domains, including off-road and indoor environments.
Related papers
- SimBEV: A Synthetic Multi-Task Multi-Sensor Driving Data Generation Tool and Dataset [101.51012770913627]
Bird's-eye view (BEV) perception for autonomous driving has garnered significant attention in recent years.
We introduce SimBEV, a synthetic data generation tool that incorporates information from multiple sources to capture accurate BEV ground truth data.
We use SimBEV to create the SimBEV dataset, a large collection of annotated perception data from diverse driving scenarios.
arXiv Detail & Related papers (2025-02-04T00:00:06Z) - Epipolar Attention Field Transformers for Bird's Eye View Semantic Segmentation [26.245188807280684]
This paper addresses the dependency on learned positional encodings to correlate image and BEV feature map elements for transformer-based methods.
We propose leveraging epipolar geometric constraints to model the relationship between cameras and the BEV by Epipolar Attention Fields.
Experiments show that our method EAFormer outperforms previous BEV approaches by 2% mIoU for map semantic segmentation.
arXiv Detail & Related papers (2024-12-02T15:15:10Z) - Learning Content-Aware Multi-Modal Joint Input Pruning via Bird's-Eye-View Representation [11.074747442071729]
We introduce a novel content-aware multi-modal joint input pruning technique.
We validatethe efficacy of our approach through extensive experiments on the NuScenes dataset.
arXiv Detail & Related papers (2024-10-09T03:30:00Z) - OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird's-eye-view Vehicle Semantic Segmentation [57.2213693781672]
Bird's-eye-view (BEV) semantic segmentation is becoming crucial in autonomous driving systems.
We propose OE-BevSeg, an end-to-end multimodal framework that enhances BEV segmentation performance.
Our approach achieves state-of-the-art results by a large margin on the nuScenes dataset for vehicle segmentation.
arXiv Detail & Related papers (2024-07-18T03:48:22Z) - DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception [104.87876441265593]
Camera-only Bird's Eye View (BEV) has demonstrated great potential in environment perception in a 3D space.
Unsupervised domain adaptive BEV, which effective learning from various unlabelled target data, is far under-explored.
We design DA-BEV, the first domain adaptive camera-only BEV framework that addresses domain adaptive BEV challenges by exploiting the complementary nature of image-view features and BEV features.
arXiv Detail & Related papers (2024-01-13T04:21:24Z) - Diffusion-Based Particle-DETR for BEV Perception [94.88305708174796]
Bird-Eye-View (BEV) is one of the most widely-used scene representations for visual perception in Autonomous Vehicles (AVs)
Recent diffusion-based methods offer a promising approach to uncertainty modeling for visual perception but fail to effectively detect small objects in the large coverage of the BEV.
Here, we address this problem by combining the diffusion paradigm with current state-of-the-art 3D object detectors in BEV.
arXiv Detail & Related papers (2023-12-18T09:52:14Z) - Semi-Supervised Learning for Visual Bird's Eye View Semantic
Segmentation [16.3996408206659]
We present a novel semi-supervised framework for visual BEV semantic segmentation to boost performance by exploiting unlabeled images during the training.
A consistency loss that makes full use of unlabeled data is then proposed to constrain the model on not only semantic prediction but also the BEV feature.
Experiments on the nuScenes and Argoverse datasets show that our framework can effectively improve prediction accuracy.
arXiv Detail & Related papers (2023-08-28T12:23:36Z) - Delving into the Devils of Bird's-eye-view Perception: A Review,
Evaluation and Recipe [115.31507979199564]
Learning powerful representations in bird's-eye-view (BEV) for perception tasks is trending and drawing extensive attention both from industry and academia.
As sensor configurations get more complex, integrating multi-source information from different sensors and representing features in a unified view come of vital importance.
The core problems for BEV perception lie in (a) how to reconstruct the lost 3D information via view transformation from perspective view to BEV; (b) how to acquire ground truth annotations in BEV grid; and (d) how to adapt and generalize algorithms as sensor configurations vary across different scenarios.
arXiv Detail & Related papers (2022-09-12T15:29:13Z) - Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View
Images [4.449481309681663]
We present the first end-to-end learning approach for directly predicting dense panoptic segmentation maps in the Bird's-Eye-View (BEV) maps.
Our architecture follows the top-down paradigm and incorporates a novel dense transformer module.
We derive a mathematical formulation for the sensitivity of the FV-BEV transformation which allows us to intelligently weight pixels in the BEV space.
arXiv Detail & Related papers (2021-08-06T17:59:11Z) - Cycle and Semantic Consistent Adversarial Domain Adaptation for Reducing
Simulation-to-Real Domain Shift in LiDAR Bird's Eye View [110.83289076967895]
We present a BEV domain adaptation method based on CycleGAN that uses prior semantic classification in order to preserve the information of small objects of interest during the domain adaptation process.
The quality of the generated BEVs has been evaluated using a state-of-the-art 3D object detection framework at KITTI 3D Object Detection Benchmark.
arXiv Detail & Related papers (2021-04-22T12:47:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.