Fugu-MT 論文翻訳(概要): Summarize Before You Speak with ARACH: A Training-Free Inference-Time Plug-In for Enhancing LLMs via Global Attention Reallocation

論文の概要: Summarize Before You Speak with ARACH: A Training-Free Inference-Time Plug-In for Enhancing LLMs via Global Attention Reallocation

arxiv url: http://arxiv.org/abs/2603.11067v1
Date: Tue, 10 Mar 2026 06:07:54 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-13 14:46:25.492838
Title: Summarize Before You Speak with ARACH: A Training-Free Inference-Time Plug-In for Enhancing LLMs via Global Attention Reallocation
Title（参考訳）: ARACHで話す前に要約:グローバルアテンション・アロケーションによるLLMの強化のためのトレーニング不要推論時間プラグイン
Authors: Jingtao Wang, Yucong Wang, Jun Ding, Rui Cai, Xun Wang,
Abstract要約: 大規模言語モデル(LLM)は優れたパフォーマンスを達成するが、さらなる向上にはコストのかかるトレーニングが必要になることが多い。これは、トレーニング後のテクニック、特にウェイトを更新せずに推論時にモデルを改善するトレーニングなしのアプローチへの関心の高まりを動機付けている。本稿では,適応型文脈ハブを用いた学習自由推論時プラグインであるARACH(Attention Reallocation via an Adaptive Context Hub)を提案する。
参考スコア（独自算出の注目度）: 9.508727214134106
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) achieve remarkable performance, yet further gains often require costly training. This has motivated growing interest in post-training techniques-especially training-free approaches that improve models at inference time without updating weights. Most training-free methods treat the model as a black box and improve outputs via input/output-level interventions, such as prompt design and test-time scaling through repeated sampling, reranking/verification, or search. In contrast, they rarely offer a plug-and-play mechanism to intervene in a model's internal computation. We propose ARACH(Attention Reallocation via an Adaptive Context Hub), a training-free inference-time plug-in that augments LLMs with an adaptive context hub to aggregate context and reallocate attention. Extensive experiments across multiple language modeling tasks show consistent improvements with modest inference overhead and no parameter updates. Attention analyses further suggest that ARACH mitigates the attention sink phenomenon. These results indicate that engineering a model's internal computation offers a distinct inference-time strategy, fundamentally different from both prompt-based test-time methods and training-based post-training approaches.
Abstract（参考訳）: 大規模言語モデル(LLM)は優れたパフォーマンスを達成するが、さらなる向上にはコストのかかるトレーニングが必要になることが多い。これは、トレーニング後のテクニック、特にウェイトを更新せずに推論時にモデルを改善するトレーニングなしのアプローチへの関心の高まりを動機付けている。トレーニングなしのほとんどのメソッドは、モデルをブラックボックスとして扱い、インプット/アウトプットレベルの介入によって出力を改善する。対照的に、モデルの内部計算に介入するプラグイン・アンド・プレイ機構はめったに提供されない。本稿では,適応型文脈ハブを用いた学習自由推論時プラグインであるARACH(Attention Reallocation via an Adaptive Context Hub)を提案する。複数の言語モデリングタスクにわたる大規模な実験は、控えめな推論オーバーヘッドとパラメータ更新のない一貫した改善を示している。注意分析により、ARACHはアテンションシンク現象を緩和することを示唆している。これらの結果は、モデルの内部計算のエンジニアリングが、プロンプトベースのテストタイム手法とトレーニングベースのポストトレーニングアプローチの両方と根本的に異なる、推論時戦略を提供することを示している。

論文の概要: Summarize Before You Speak with ARACH: A Training-Free Inference-Time Plug-In for Enhancing LLMs via Global Attention Reallocation

関連論文リスト