BEVUDA: Multi-geometric Space Alignments for Domain Adaptive BEV 3D Object Detection
- URL: http://arxiv.org/abs/2211.17126v2
- Date: Wed, 27 Mar 2024 10:50:54 GMT
- Title: BEVUDA: Multi-geometric Space Alignments for Domain Adaptive BEV 3D Object Detection
- Authors: Jiaming Liu, Rongyu Zhang, Xiaoqi Li, Xiaowei Chi, Zehui Chen, Ming Lu, Yandong Guo, Shanghang Zhang,
- Abstract summary: Vision-centric bird-eye-view (BEV) perception has shown promising potential in autonomous driving.
Recent works mainly focus on improving efficiency or accuracy but neglect the challenges when facing environment changing.
We propose a Multi-space Alignment Teacher-Student (MATS) framework to ease the domain shift accumulation.
- Score: 38.89172554074154
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision-centric bird-eye-view (BEV) perception has shown promising potential in autonomous driving. Recent works mainly focus on improving efficiency or accuracy but neglect the challenges when facing environment changing, resulting in severe degradation of transfer performance. For BEV perception, we figure out the significant domain gaps existing in typical real-world cross-domain scenarios and comprehensively solve the Domain Adaption (DA) problem for multi-view 3D object detection. Since BEV perception approaches are complicated and contain several components, the domain shift accumulation on multiple geometric spaces (i.e., 2D, 3D Voxel, BEV) makes BEV DA even challenging. In this paper, we propose a Multi-space Alignment Teacher-Student (MATS) framework to ease the domain shift accumulation, which consists of a Depth-Aware Teacher (DAT) and a Geometric-space Aligned Student (GAS) model. DAT tactfully combines target lidar and reliable depth prediction to construct depth-aware information, extracting target domain-specific knowledge in Voxel and BEV feature spaces. It then transfers the sufficient domain knowledge of multiple spaces to the student model. In order to jointly alleviate the domain shift, GAS projects multi-geometric space features to a shared geometric embedding space and decreases data distribution distance between two domains. To verify the effectiveness of our method, we conduct BEV 3D object detection experiments on three cross-domain scenarios and achieve state-of-the-art performance.
Related papers
- BEVUDA++: Geometric-aware Unsupervised Domain Adaptation for Multi-View 3D Object Detection [56.477525075806966]
Vision-centric Bird's Eye View (BEV) perception holds considerable promise for autonomous driving.<n>Recent studies have prioritized efficiency or accuracy enhancements, yet the issue of domain shift has been overlooked.<n>We introduce an innovative geometric-aware teacher-student framework, BEVUDA++, to diminish this issue.
arXiv Detail & Related papers (2025-09-17T16:31:40Z) - Multimodal 3D Object Detection on Unseen Domains [37.142470149311904]
Domain adaptation approaches assume access to unannotated samples from the test distribution to address this problem.
We propose CLIX$text3D$, a multimodal fusion and supervised contrastive learning framework for 3D object detection.
We show that CLIX$text3D$ yields state-of-the-art domain generalization performance under multiple dataset shifts.
arXiv Detail & Related papers (2024-04-17T21:47:45Z) - Towards Unified 3D Object Detection via Algorithm and Data Unification [70.27631528933482]
We build the first unified multi-modal 3D object detection benchmark MM- Omni3D and extend the aforementioned monocular detector to its multi-modal version.
We name the designed monocular and multi-modal detectors as UniMODE and MM-UniMODE, respectively.
arXiv Detail & Related papers (2024-02-28T18:59:31Z) - DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception [104.87876441265593]
Camera-only Bird's Eye View (BEV) has demonstrated great potential in environment perception in a 3D space.
Unsupervised domain adaptive BEV, which effective learning from various unlabelled target data, is far under-explored.
We design DA-BEV, the first domain adaptive camera-only BEV framework that addresses domain adaptive BEV challenges by exploiting the complementary nature of image-view features and BEV features.
arXiv Detail & Related papers (2024-01-13T04:21:24Z) - Density-Insensitive Unsupervised Domain Adaption on 3D Object Detection [19.703181080679176]
3D object detection from point clouds is crucial in safety-critical autonomous driving.
We propose a density-insensitive domain adaption framework to address the density-induced domain gap.
Experimental results on three widely adopted 3D object detection datasets demonstrate that our proposed domain adaption method outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2023-04-19T06:33:07Z) - Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain.
GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors.
We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z) - Towards Domain Generalization for Multi-view 3D Object Detection in
Bird-Eye-View [11.958753088613637]
We first analyze the causes of the domain gap for the MV3D-Det task.
To acquire a robust depth prediction, we propose to decouple the depth estimation from intrinsic parameters of the camera.
We modify the focal length values to create multiple pseudo-domains and construct an adversarial training loss to encourage the feature representation to be more domain-agnostic.
arXiv Detail & Related papers (2023-03-03T02:59:13Z) - DA-CIL: Towards Domain Adaptive Class-Incremental 3D Object Detection [2.207918236777924]
We propose a novel 3D domain adaptive class-incremental object detection framework, DA-CIL.
We design a novel dual-domain copy-paste augmentation method to construct multiple augmented domains for diversifying training distributions.
Experiments on various datasets demonstrate the effectiveness of the proposed method over baselines.
arXiv Detail & Related papers (2022-12-05T06:45:27Z) - Geometry-Aware Network for Domain Adaptive Semantic Segmentation [64.00345743710653]
We propose a novel Geometry-Aware Network for Domain Adaptation (GANDA) to shrink the domain gaps.
We exploit 3D topology on the point clouds generated from RGB-D images for coordinate-color disentanglement and pseudo-labels refinement in the target domain.
Our model outperforms state-of-the-arts on GTA5->Cityscapes and SYNTHIA->Cityscapes.
arXiv Detail & Related papers (2022-12-02T00:48:44Z) - Unsupervised Domain Adaptation for Monocular 3D Object Detection via
Self-Training [57.25828870799331]
We propose STMono3D, a new self-teaching framework for unsupervised domain adaptation on Mono3D.
We develop a teacher-student paradigm to generate adaptive pseudo labels on the target domain.
STMono3D achieves remarkable performance on all evaluated datasets and even surpasses fully supervised results on the KITTI 3D object detection dataset.
arXiv Detail & Related papers (2022-04-25T12:23:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.