Fugu-MT 論文翻訳(概要): BEVUDA++: Geometric-aware Unsupervised Domain Adaptation for Multi-View 3D Object Detection

論文の概要: BEVUDA++: Geometric-aware Unsupervised Domain Adaptation for Multi-View 3D Object Detection

arxiv url: http://arxiv.org/abs/2509.14151v1
Date: Wed, 17 Sep 2025 16:31:40 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-18 18:41:50.920797
Title: BEVUDA++: Geometric-aware Unsupervised Domain Adaptation for Multi-View 3D Object Detection
Title（参考訳）: BEVUDA++:多視点3Dオブジェクト検出のための幾何学的非教師付きドメイン適応
Authors: Rongyu Zhang, Jiaming Liu, Xiaoqi Li, Xiaowei Chi, Dan Wang, Li Du, Yuan Du, Shanghang Zhang,
Abstract要約: 視覚中心のBird's Eye View (BEV) の認識は、自律運転にかなりの可能性を秘めている。近年の研究では、効率性や精度の向上が優先されているが、ドメインシフトの問題は見過ごされている。本稿では,この問題を解消するために,革新的な幾何学的学習支援フレームワークであるBEVUDA++を紹介する。
参考スコア（独自算出の注目度）: 56.477525075806966
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision-centric Bird's Eye View (BEV) perception holds considerable promise for autonomous driving. Recent studies have prioritized efficiency or accuracy enhancements, yet the issue of domain shift has been overlooked, leading to substantial performance degradation upon transfer. We identify major domain gaps in real-world cross-domain scenarios and initiate the first effort to address the Domain Adaptation (DA) challenge in multi-view 3D object detection for BEV perception. Given the complexity of BEV perception approaches with their multiple components, domain shift accumulation across multi-geometric spaces (e.g., 2D, 3D Voxel, BEV) poses a significant challenge for BEV domain adaptation. In this paper, we introduce an innovative geometric-aware teacher-student framework, BEVUDA++, to diminish this issue, comprising a Reliable Depth Teacher (RDT) and a Geometric Consistent Student (GCS) model. Specifically, RDT effectively blends target LiDAR with dependable depth predictions to generate depth-aware information based on uncertainty estimation, enhancing the extraction of Voxel and BEV features that are essential for understanding the target domain. To collaboratively reduce the domain shift, GCS maps features from multiple spaces into a unified geometric embedding space, thereby narrowing the gap in data distribution between the two domains. Additionally, we introduce a novel Uncertainty-guided Exponential Moving Average (UEMA) to further reduce error accumulation due to domain shifts informed by previously obtained uncertainty guidance. To demonstrate the superiority of our proposed method, we execute comprehensive experiments in four cross-domain scenarios, securing state-of-the-art performance in BEV 3D object detection tasks, e.g., 12.9\% NDS and 9.5\% mAP enhancement on Day-Night adaptation.
Abstract（参考訳）: 視覚中心のBird's Eye View (BEV) の認識は、自律運転にかなりの可能性を秘めている。近年の研究では、効率性や精度の向上が優先されているが、ドメインシフトの問題は見過ごされ、転送時の大幅な性能低下につながっている。実世界のクロスドメインシナリオにおける大きなドメインギャップを特定し、BEV知覚のための多視点3Dオブジェクト検出において、ドメイン適応(DA)課題に対処する最初の取り組みを開始する。複数のコンポーネントによるBEV知覚アプローチの複雑さを考えると、多幾何学空間(例えば、2D、3D Voxel、BEV)におけるドメインシフトの蓄積は、BEVドメイン適応にとって大きな課題となる。本稿では,Reliable Depth Teacher (RDT) と Geometric Consistent Students (GCS) モデルを含む,革新的な幾何学的学習者支援フレームワークである BEVUDA++ を導入する。具体的には、RDTは、ターゲットのLiDARと信頼できる深さ予測を効果的にブレンドし、不確実性推定に基づいて深度認識情報を生成し、ターゲットドメインを理解するのに不可欠なVoxelとBEVの特徴の抽出を強化する。ドメインシフトを協調的に低減するため、GCSは複数の空間から統合された幾何学的埋め込み空間に特徴をマッピングし、2つのドメイン間のデータ分散のギャップを狭める。さらに,従来得られた不確実性ガイダンスによって得られた領域シフトによるエラーの蓄積を低減するために,新しい不確実性誘導指数移動平均(UEMA)を導入する。提案手法の優位性を示すため、4つのクロスドメインシナリオにおいて総合的な実験を行い、例えば、12.9\% NDSおよび9.5\% mAPによる日中適応の強化など、BEV 3Dオブジェクト検出タスクにおける最先端性能を確保する。

論文の概要: BEVUDA++: Geometric-aware Unsupervised Domain Adaptation for Multi-View 3D Object Detection

関連論文リスト