Fugu-MT 論文翻訳(概要): Linking Modality Isolation in Heterogeneous Collaborative Perception

論文の概要: Linking Modality Isolation in Heterogeneous Collaborative Perception

arxiv url: http://arxiv.org/abs/2603.00609v1
Date: Sat, 28 Feb 2026 12:09:08 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-03 19:50:56.289657
Title: Linking Modality Isolation in Heterogeneous Collaborative Perception
Title（参考訳）: 異種協調知覚におけるリンクモダリティの分離
Authors: Changxing Liu, Zichen Chao, Siheng Chen,
Abstract要約: そこで我々は, 横断的特徴コード機能変換(FCF)によってモダリティを円滑に整列させるフレームワークであるCodeAlignを提案する。 CodeAlignはFCF翻訳を学び、特徴を他のモダリティの対応するコードにマッピングし、対象のコード空間の機能に復号する。 3つのモードを統合する場合、CodeAlignは事前アライメント手法のトレーニングパラメータの8%しか必要とせず、通信負荷を1024倍に減らし、OPV2VとDAIRV2Xの両方のデータセットにおける最先端の知覚性能を実現する。
参考スコア（独自算出の注目度）: 41.68601421239159
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Collaborative perception leverages data exchange among multiple agents to enhance overall perception capabilities. However, heterogeneity across agents introduces domain gaps that hinder collaboration, and this is further exacerbated by an underexplored issue: modality isolation. It arises when multiple agents with different modalities never co-occur in any training data frame, enlarging cross-modal domain gaps. Existing alignment methods rely on supervision from spatially overlapping observations, thus fail to handle modality isolation. To address this challenge, we propose CodeAlign, the first efficient, co-occurrence-free alignment framework that smoothly aligns modalities via cross-modal feature-code-feature(FCF) translation. The key idea is to explicitly identify the representation consistency through codebook, and directly learn mappings between modality-specific feature spaces, thereby eliminating the need for spatial correspondence. Codebooks regularize feature spaces into code spaces, providing compact yet expressive representations. With a prepared code space for each modality, CodeAlign learns FCF translations that map features to the corresponding codes of other modalities, which are then decoded back into features in the target code space, enabling effective alignment. Experiments show that, when integrating three modalities, CodeAlign requires only 8% of the training parameters of prior alignment methods, reduces communication load by 1024x, and achieves state-of-the-art perception performance on both OPV2V and DAIR-V2X dataset. Code will be released on https://github.com/cxliu0314/CodeAlign.
Abstract（参考訳）: 協調的知覚は、複数のエージェント間のデータ交換を活用し、全体的な知覚能力を高める。しかし、エージェント間の不均一性は、協調を妨げるドメインギャップを導入し、これは未解決の問題であるモダリティ分離によってさらに悪化する。異なるモダリティを持つ複数のエージェントが任意のトレーニングデータフレームで共起しないことで、クロスモーダルなドメインギャップが大きくなる。既存のアライメント手法は、空間的に重なり合う観測の監督に依存しており、そのため、モダリティ分離を処理できない。この課題に対処するため、我々は、横断的特徴コード機能変換(FCF)によってモダリティを円滑に整列する、最初の効率的で共起のないアライメントフレームワークであるCodeAlignを提案する。鍵となる考え方は、コードブックを通じて表現整合性を明示的に識別し、モダリティ固有の特徴空間間のマッピングを直接学習することで、空間対応の必要性をなくすことである。コードブックは機能空間をコード空間に正規化し、コンパクトで表現力のある表現を提供する。それぞれのモダリティのための準備されたコード空間で、CodeAlignはFCF翻訳を学び、特徴を他のモダリティの対応するコードにマッピングし、ターゲットのコード空間の機能に復号し、効果的なアライメントを可能にする。 3つのモードを統合する場合、CodeAlignは事前アライメントメソッドのトレーニングパラメータの8%しか必要とせず、通信負荷を1024倍に削減し、OPV2VとDAIR-V2Xデータセットの両方で最先端の知覚性能を実現する。コードはhttps://github.com/cxliu0314/CodeAlignでリリースされる。

論文の概要: Linking Modality Isolation in Heterogeneous Collaborative Perception

関連論文リスト