Fugu-MT 論文翻訳(概要): Anisotropic Modality Align

論文の概要: Anisotropic Modality Align

arxiv url: http://arxiv.org/abs/2605.07825v1
Date: Fri, 08 May 2026 14:53:24 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-11 19:43:39.135696
Title: Anisotropic Modality Align
Title（参考訳）: 異方性モダリティアライメント
Authors: Xiaomin Yu, Yijiang Li, Yuhui Zhang, Hanzhen Zhao, Yue Yang, Hao Tang, Yue Song, Xiaobin Hu, Chengwei Qin, Shuicheng Yan, Hui Xiong,
Abstract要約: マルチモーダルな大規模言語モデルの訓練は、高品質なペア型マルチモーダルデータの不足により、長い間制限されてきた。近年の研究では、事前訓練されたマルチモーダルコントラストモデルの共有表現空間がブリッジとして機能し、非モーダルデータを用いたマルチモーダルトレーニングを可能にすることが示されている。中心となる障害は、共有空間の永続的なモダリティギャップにある。
参考スコア（独自算出の注目度）: 91.23979617826926
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Training multimodal large language models has long been limited by the scarcity of high-quality paired multimodal data. Recent studies show that the shared representation space of pretrained multimodal contrastive models can serve as a bridge, enabling models to perform multimodal training with unimodal data. However, the key premise of this paradigm remains insufficiently understood: can representations from different modalities be reliably interchanged? The core obstacle lies in the persistent Modality Gap in the shared space. In this work, we revisit the geometric nature of the modality gap. We find that modality representations already share compatible dominant semantic geometry. What truly hinders modality interchangeability is not a simple global shift, but an anisotropic residual structure concentrated along a small number of dominant directions. Based on this finding, we further propose the principle of anisotropic modality gap alignment: effective modality alignment should align with the target-modality distribution while preserving the semantic structure of the source modality. Guided by this principle, we propose an anisotropic geometric correction framework, AnisoAlign, for unpaired modality alignment. This framework leverages the internal geometric prior of the target modality and performs bounded correction on source-modality representations, thereby constructing substitute representations in the target modality. Experiments confirm its benefits in both geometric diagnostics and text-only MLLM training. Overall, this work recasts the modality gap from an empirical observation into a correctable, structured geometric phenomenon and provides a new representation alignment perspective for training multimodal models with unimodal data.
Abstract（参考訳）: マルチモーダルな大規模言語モデルの訓練は、高品質なペア型マルチモーダルデータの不足により、長い間制限されてきた。近年の研究では、事前訓練されたマルチモーダルコントラストモデルの共有表現空間がブリッジとして機能し、非モーダルデータを用いたマルチモーダルトレーニングを可能にすることが示されている。しかし、このパラダイムの重要な前提は、まだ十分に理解されていない:異なるモダリティの表現は確実に交換できるのか? 中心となる障害は、共有空間の永続的なモダリティギャップにある。本研究では,モダリティギャップの幾何学的性質を再考する。モダリティ表現は、すでに互換性のある支配的な意味幾何学を共有している。モダリティの交換性を本当に妨げているのは、単純なグローバルシフトではなく、少数の支配的な方向に沿って非等方的残留構造が集中していることである。そこで本研究では,異方性モダリティギャップアライメントの原理として,ソースモダリティのセマンティック構造を保ちつつ,対象モダリティ分布と効果的モダリティアライメントを一致させる方法を提案する。この原理により、不等方的幾何補正フレームワークAnisoAlignを提案する。このフレームワークは、対象モダリティの内部幾何学的先行を利用して、ソース・モダリティ表現の有界補正を行い、対象モダリティにおける代替表現を構築する。実験は、幾何学的診断とテキストのみのMLLMトレーニングの両方において、その利点を確認している。全体として、この研究は経験的観測から修正可能で構造化された幾何学的現象へのモダリティギャップをリキャストし、非モーダルデータを用いたマルチモーダルモデルのトレーニングのための新しい表現アライメント視点を提供する。

論文の概要: Anisotropic Modality Align

関連論文リスト