Fugu-MT 論文翻訳(概要): On the Nature of Attention Sink that Shapes Decoding Strategy in MLLMs

論文の概要: On the Nature of Attention Sink that Shapes Decoding Strategy in MLLMs

arxiv url: http://arxiv.org/abs/2603.14337v1
Date: Sun, 15 Mar 2026 12:05:35 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-17 16:19:35.757147
Title: On the Nature of Attention Sink that Shapes Decoding Strategy in MLLMs
Title（参考訳）: MLLMにおけるデコード戦略を形作る注意シンクの性質について
Authors: Suho Yoo, Youngjoon Jang, Joon Son Chung,
Abstract要約: OutRoはシンクトークンを利用してコンテキスト表現を強化する軽量な推論時間戦略である。実験に基づいて、OutRoは7つのビデオQAベンチマークで代表MLLMのパフォーマンスを一貫して改善する。
参考スコア（独自算出の注目度）: 38.05844382560401
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models and their multimodal extensions have achieved remarkable success across diverse tasks, yet the internal mechanisms that govern their reasoning behaviour remain partially understood. In particular, the attention sink, a token that attracts disproportionate attention mass, has been observed in transformer architectures, but its role is still unclear. Our goal is to understand what attention sinks represent and how they shape model behaviour during inference, rather than considering them as incidental artifacts. Through our analysis, we find that attention sink representations encode structured global information that influences the decoding process. Building on our findings, we introduce OutRo, a lightweight inference-time strategy that leverages the sink token to enhance contextual representations: (i) non-sink token representations are aligned with the sink representation in the feature space; and (ii) the sink token is allowed to attend beyond the causal constraint, facilitating information exchange with non-sink tokens. This design enhances the reasoning process without requiring additional forward passes or access to attention maps. Based on extensive experiments, OutRo consistently improves performance across representative MLLMs on seven video QA benchmarks and demonstrates strong generalisation, while incurring only a 1.1x decoding overhead.
Abstract（参考訳）: 大規模言語モデルとそのマルチモーダル拡張は、様々なタスクで顕著な成功を収めてきたが、その推論行動を管理する内部メカニズムは、部分的には理解されていない。特に、不均等な注意質量を引き付けるトークンであるアテンションシンクは、トランスフォーマーアーキテクチャーで観測されているが、その役割はまだ不明である。私たちのゴールは、インシデントアーティファクトとしてではなく、アテンションシンクが何を表現し、推論中にモデル行動をどのように形成するかを理解することです。分析の結果,アテンションシンク表現はデコードプロセスに影響を与える構造化されたグローバル情報を符号化していることがわかった。この結果に基づいて,シンクトークンを利用してコンテキスト表現を強化する軽量な推論時間戦略であるOutRoを紹介した。 (i)非シンクトークン表現は、特徴空間におけるシンク表現と整合している。 (ii)シンクトークンは因果制約を超えて参加することができ、非シンクトークンとの情報交換を容易にする。この設計は、追加のフォワードパスやアテンションマップへのアクセスを必要とせずに推論プロセスを強化する。広範な実験に基づいて、OutRoは7つのビデオQAベンチマークにおける代表MLLMのパフォーマンスを一貫して改善し、強力な一般化を実証し、オーバーヘッドは1.1倍である。

関連論文リスト

Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization [56.083511902353365]
強化学習(Reinforcement Learning, RL)は、一般的に、大規模言語モデルの全世代にわたって一様クレジットを適用する。この研究は、LSMの内部論理を推論自体の機械的青写真として描画する特権基板として注意を向けている。クリティカルノードに対するターゲットクレジット割り当てを動的に行う3つの新しいRL戦略を導入する。
論文参考訳（メタデータ） (2025-10-15T13:49:51Z)
Mask & Match: Learning to Recognize Handwritten Math with Self-Supervised Attention [0.19116784879310025]
我々は手書き数式(HMER)の認識のための自己教師型学習フレームワークを提案する。我々のアプローチは、大域的および局所的なコントラスト的損失の組み合わせを用いて、画像エンコーダを事前訓練することから始まる。この研究の重要な貢献は、プログレッシブな空間マスキング戦略を用いて訓練された、新しい自己監督型アテンションネットワークである。
論文参考訳（メタデータ） (2025-08-08T08:11:36Z)
Artifacts and Attention Sinks: Structured Approximations for Efficient Vision Transformers [8.486148475471271]
ビジョントランスフォーマーは幅広いアプリケーションにまたがる強力なツールとして登場したが、内部の動作は部分的にしか理解されていない。大量のトークン – 注目シンクとして機能する極めて高いアクティベーション規範を持つトークン – と,推論中に副産物として現れるアーティファクトトークン – の現象について検討する。我々は、線形時間と空間における自己注意を近似する訓練不要なFast Nystr"om Attention (FNA)を導入する。
論文参考訳（メタデータ） (2025-07-21T19:29:03Z)
From Compression to Expression: A Layerwise Analysis of In-Context Learning [24.45948310980883]
In-context Learning (ICL)は、大規模な言語モデルで、デモシーケンスから学習することで、重み付けなしで新しいタスクに適応することができる。 ICL表現の統計的幾何学的解析を行い,各層にまたがるタスク固有情報の取得方法について検討する。この結果から,ILC の階層的ダイナミックな構造的表現が LLM 内でどのように現れるかが明らかとなり,内部表現の分析がモデル行動のより深い理解を促進することが示唆された。
論文参考訳（メタデータ） (2025-05-22T22:22:03Z)
Core Context Aware Transformers for Long Context Language Modeling [50.774702091154204]
高速な長文モデリングのためのCCAアテンションを提案する。本手法は,学習過程における冗長性を低下させながら,コアコンテキストに自動的に焦点を合わせ,強化する。提案手法は,既存の大規模言語モデルにおける自己注意モジュールを最小限の微調整コストで置き換えることができる。
論文参考訳（メタデータ） (2024-12-17T01:54:08Z)
Fortify the Shortest Stave in Attention: Enhancing Context Awareness of Large Language Models for Effective Tool Use [74.72150542395487]
大規模言語モデル(LLM)の注意配分における固有波形パターンは、高い文脈認識を必要とするタスクにおいて、その性能に大きな影響を及ぼす。この問題に対処するため,Attention Buckets という新しい推論手法を提案する。
論文参考訳（メタデータ） (2023-12-07T17:24:51Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。