Fugu-MT 論文翻訳(概要): Understanding Cross-Sensor Feature Variations for Generalizable 3D Perception

論文の概要: Understanding Cross-Sensor Feature Variations for Generalizable 3D Perception

arxiv url: http://arxiv.org/abs/2606.11573v1
Date: Wed, 10 Jun 2026 01:57:24 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-11 16:42:38.247988
Title: Understanding Cross-Sensor Feature Variations for Generalizable 3D Perception
Title（参考訳）: 一般化可能な3次元知覚のためのクロスセンサ特徴変化の理解
Authors: Xin Qiu, Wenjie Liu, Fuyuan Ai, YuChen Tan, Zhiwei Xu, Chunyi Song,
Abstract要約: 運転シーン、センサー構成、環境条件の変化は、入力された観察と内部の融合表現の両方を変えることができる。本稿では、周波数領域における視覚シーンの変動を特徴付けるフレームワークを導入し、それを多種多様なソースドメインビューの合成に利用する。これらの変動パターンは検出器を規則化するために使用され、学習された融合空間は潜伏したシーンの変化の下で安定し続けるように促される。
参考スコア（独自算出の注目度）: 8.442256203698774
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Radar-camera BEV perception often suffers from degraded performance when evaluated across datasets, as changes in driving scenes, sensor configurations, and environmental conditions can alter both the input observations and the internal fused representations. This work studies this issue from the perspective of source-domain variation modeling, aiming to improve the robustness of BEV-based 3D detectors without relying on target-domain samples. We introduce a framework that characterizes visual scene variations in the frequency domain and uses them to synthesize diverse source-domain views. By comparing the resulting fused BEV representations, the framework further captures how image-level variations influence multi-modal BEV features. These variation patterns are then used to regularize the detector, encouraging the learned fusion space to remain stable under latent scene changes. The proposed method is applied only during training and leaves the inference pipeline unchanged. Experiments on cross-dataset radar-camera 3D detection between View-of-Delft and TJ4DRadSet demonstrate consistent improvements over multiple BEV fusion backbones, and the gains remain effective when a small amount of target-domain data is available.
Abstract（参考訳）: レーダカメラのBEV知覚は、駆動シーン、センサー構成、環境条件の変化が入力観察と内部融合表現の両方を変える可能性があるため、データセット間で評価された場合、劣化するパフォーマンスに悩まされることが多い。本研究は,BEVをベースとした3次元検出器のロバスト性向上を目標領域サンプルに頼らずに実現することを目的として,ソースドメイン変動モデリングの観点からこの問題を考察する。本稿では、周波数領域における視覚シーンの変動を特徴付けるフレームワークを導入し、それを多種多様なソースドメインビューの合成に利用する。このフレームワークは、融合したBEV表現を比較することで、画像レベルの変動がマルチモーダルなBEV機能にどのように影響するかをさらに捉えている。これらの変動パターンは検出器を規則化するために使用され、学習された融合空間は潜伏したシーンの変化の下で安定し続けるように促される。提案手法はトレーニング中にのみ適用され,推論パイプラインは変更されない。 View-of-DelftとTJ4DRadSetのクロスデータセットレーダカメラによる3D検出実験では、複数のBEV融合バックボーンに対して一貫した改善が見られ、少量のターゲットドメインデータが利用可能であれば、ゲインは有効である。

論文の概要: Understanding Cross-Sensor Feature Variations for Generalizable 3D Perception

関連論文リスト