Vision-Driven 2D Supervised Fine-Tuning Framework for Bird's Eye View Perception
- URL: http://arxiv.org/abs/2409.05834v1
- Date: Mon, 9 Sep 2024 17:40:30 GMT
- Title: Vision-Driven 2D Supervised Fine-Tuning Framework for Bird's Eye View Perception
- Authors: Lei He, Qiaoyi Wang, Honglin Sun, Qing Xu, Bolin Gao, Shengbo Eben Li, Jianqiang Wang, Keqiang Li,
- Abstract summary: We propose a fine-tuning method for BEV perception network based on visual 2D semantic perception.
Considering the maturity and development of 2D perception technologies, our method significantly reduces the dependency on high-cost LiDAR ground truths.
- Score: 20.875243604623723
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual bird's eye view (BEV) perception, due to its excellent perceptual capabilities, is progressively replacing costly LiDAR-based perception systems, especially in the realm of urban intelligent driving. However, this type of perception still relies on LiDAR data to construct ground truth databases, a process that is both cumbersome and time-consuming. Moreover, most massproduced autonomous driving systems are only equipped with surround camera sensors and lack LiDAR data for precise annotation. To tackle this challenge, we propose a fine-tuning method for BEV perception network based on visual 2D semantic perception, aimed at enhancing the model's generalization capabilities in new scene data. Considering the maturity and development of 2D perception technologies, our method significantly reduces the dependency on high-cost BEV ground truths and shows promising industrial application prospects. Extensive experiments and comparative analyses conducted on the nuScenes and Waymo public datasets demonstrate the effectiveness of our proposed method.
Related papers
- BEVPose: Unveiling Scene Semantics through Pose-Guided Multi-Modal BEV Alignment [8.098296280937518]
We present BEVPose, a framework that integrates BEV representations from camera and lidar data, using sensor pose as a guiding supervisory signal.
By leveraging pose information, we align and fuse multi-modal sensory inputs, facilitating the learning of latent BEV embeddings that capture both geometric and semantic aspects of the environment.
arXiv Detail & Related papers (2024-10-28T12:40:27Z) - Robust Bird's Eye View Segmentation by Adapting DINOv2 [3.236198583140341]
We adapt a vision foundational model, DINOv2, to BEV estimation using Low Rank Adaptation (LoRA)
Our experiments show increased robustness of BEV perception under various corruptions.
We also showcase the effectiveness of the adapted representations in terms of fewer learnable parameters and faster convergence during training.
arXiv Detail & Related papers (2024-09-16T12:23:35Z) - Hierarchical and Decoupled BEV Perception Learning Framework for Autonomous Driving [52.808273563372126]
This paper proposes a novel hierarchical BEV perception paradigm, aiming to provide a library of fundamental perception modules and user-friendly graphical interface.
We conduct the Pretrain-Finetune strategy to effectively utilize large scale public datasets and streamline development processes.
We also present a Multi-Module Learning (MML) approach, enhancing performance through synergistic and iterative training of multiple models.
arXiv Detail & Related papers (2024-07-17T11:17:20Z) - CLIP-BEVFormer: Enhancing Multi-View Image-Based BEV Detector with Ground Truth Flow [20.550935390111686]
We introduce CLIP-BEVFormer, a novel approach to enhance the multi-view image-derived BEV backbones with ground truth information flow.
We conduct extensive experiments on the challenging nuScenes dataset and showcase significant and consistent improvements over the SOTA.
arXiv Detail & Related papers (2024-03-13T19:21:03Z) - OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision.
We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range.
For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z) - Towards Efficient 3D Object Detection in Bird's-Eye-View Space for Autonomous Driving: A Convolutional-Only Approach [13.513005108086006]
We propose an efficient BEV-based 3D detection framework called BEVENet.
BEVENet is 3$times$ faster than contemporary state-of-the-art (SOTA) approaches on the NuScenes challenge.
Our experiments show that BEVENet is 3$times$ faster than contemporary state-of-the-art (SOTA) approaches.
arXiv Detail & Related papers (2023-12-01T14:52:59Z) - BEVScope: Enhancing Self-Supervised Depth Estimation Leveraging
Bird's-Eye-View in Dynamic Scenarios [12.079195812249747]
Current self-supervised depth estimation methods grapple with several limitations.
We present BEVScope, an innovative approach to self-supervised depth estimation.
We propose an adaptive loss function, specifically designed to mitigate the complexities associated with moving objects.
arXiv Detail & Related papers (2023-06-20T15:16:35Z) - BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View
Recognition via Perspective Supervision [101.36648828734646]
We present a novel bird's-eye-view (BEV) detector with perspective supervision, which converges faster and better suits modern image backbones.
The proposed method is verified with a wide spectrum of traditional and modern image backbones and achieves new SoTA results on the large-scale nuScenes dataset.
arXiv Detail & Related papers (2022-11-18T18:59:48Z) - Monocular BEV Perception of Road Scenes via Front-to-Top View Projection [57.19891435386843]
We present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view.
Our model runs at 25 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.
arXiv Detail & Related papers (2022-11-15T13:52:41Z) - Efficient and Robust LiDAR-Based End-to-End Navigation [132.52661670308606]
We present an efficient and robust LiDAR-based end-to-end navigation framework.
We propose Fast-LiDARNet that is based on sparse convolution kernel optimization and hardware-aware model design.
We then propose Hybrid Evidential Fusion that directly estimates the uncertainty of the prediction from only a single forward pass.
arXiv Detail & Related papers (2021-05-20T17:52:37Z) - Cycle and Semantic Consistent Adversarial Domain Adaptation for Reducing
Simulation-to-Real Domain Shift in LiDAR Bird's Eye View [110.83289076967895]
We present a BEV domain adaptation method based on CycleGAN that uses prior semantic classification in order to preserve the information of small objects of interest during the domain adaptation process.
The quality of the generated BEVs has been evaluated using a state-of-the-art 3D object detection framework at KITTI 3D Object Detection Benchmark.
arXiv Detail & Related papers (2021-04-22T12:47:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.