Fugu-MT 論文翻訳(概要): Mind the Gap: Learning Modality-Agnostic Representations with a Cross-Modality UNet

論文の概要: Mind the Gap: Learning Modality-Agnostic Representations with a Cross-Modality UNet

arxiv url: http://arxiv.org/abs/2605.16887v1
Date: Sat, 16 May 2026 09:00:54 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 17:57:47.219469
Title: Mind the Gap: Learning Modality-Agnostic Representations with a Cross-Modality UNet
Title（参考訳）: Mind the Gap: クロスプラットフォームUNetによるモダリティ非依存表現の学習
Authors: Xin Niu, Enyi Li, Jinchao Liu, Yan Wang, Margarita Osadchy, Yongchun Fang,
Abstract要約: クロスモダリティ認識は科学、法執行、エンターテイメントに多くの重要な応用がある。識別関連情報を保持しながら、モダリティに依存しない表現を学習するためのコンパクトエンコーダデコーダニューラルモジュール(cmUNet)を提案する。本稿では,モーダリティ非依存表現を入力とし,類似度スコアを出力する標準的な特徴抽出ネットワークに,cmUNetを接続するMarrNetを提案する。
参考スコア（独自算出の注目度）: 21.513499794627165
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cross-modality recognition has many important applications in science, law enforcement and entertainment. Popular methods to bridge the modality gap include reducing the distributional differences of representations of different modalities, learning indistinguishable representations or explicit modality transfer. The first two approaches suffer from the loss of discriminant information while removing the modality-specific variations. The third one heavily relies on the successful modality transfer, could face catastrophic performance drop when explicit modality transfers are not possible or difficult. To tackle this problem, we proposed a compact encoder-decoder neural module (cmUNet) to learn modality-agnostic representations while retaining identity-related information. This is achieved through cross-modality transformation and in-modality reconstruction, enhanced by an adversarial/perceptual loss which encourages indistinguishability of representations in the original sample space. For cross-modality matching, we propose MarrNet where cmUNet is connected to a standard feature extraction network which takes as inputs the modality-agnostic representations and outputs similarity scores for matching. We validated our method on five challenging tasks, namely Raman-infrared spectrum matching, cross-modality person re-identification and heterogeneous (photo-sketch, visible-near infrared and visible-thermal) face recognition, where MarrNet showed superior performance compared to state-of-the-art methods. Furthermore, it is observed that a cross-modality matching method could be biased to extract discriminant information from partial or even wrong regions, due to incompetence of dealing with modality gaps, which subsequently leads to poor generalization. We show that robustness to occlusions can be an indicator of whether a method can well bridge the modality gap.
Abstract（参考訳）: クロスモダリティ認識は科学、法執行、エンターテイメントに多くの重要な応用がある。モダリティギャップを埋める一般的な方法は、異なるモダリティの表現の分布的差異の低減、区別できない表現の学習、明示的なモダリティ移動などである。最初の2つのアプローチは、モダリティ固有のバリエーションを取り除きながら識別情報の喪失に悩まされる。第三に、モダリティ転送の成功に大きく依存しており、明示的なモダリティ転送が不可能あるいは困難である場合には、破滅的なパフォーマンス低下に直面する可能性がある。この問題に対処するために,識別関連情報を保持しながらモダリティに依存しない表現を学習するためのコンパクトエンコーダデコーダニューラルモジュール (cmUNet) を提案した。これは、対向的/知覚的損失によって強化され、元のサンプル空間における表現の不明瞭さを促進する、相互モダリティ変換と非モダリティ再構成によって達成される。本稿では,モーダリティ非依存の表現を入力とし,類似度スコアを出力する標準的な特徴抽出ネットワークに,cmUNetを接続するMarrNetを提案する。我々はRaman-infrared spectrum matching, cross-modality person re-identification, heterogeneous (photo-sketch, visible-near infrared and visible-thermal) face recognitionという5つの課題に対して,MarrNetが最先端の手法と比較して優れた性能を示した。さらに、モダリティマッチング手法は、モダリティギャップに対処する能力の欠如により、部分的あるいは不正な領域から識別情報を抽出するために偏りがあり、その結果、一般化が不十分になる可能性がある。咬合に対するロバスト性は、ある方法がモダリティギャップをうまく橋渡しできるかどうかを示す指標であることを示す。

論文の概要: Mind the Gap: Learning Modality-Agnostic Representations with a Cross-Modality UNet

関連論文リスト