Fugu-MT 論文翻訳(概要): All for One: LLMs Solve Mental Math at the Last Token With Information Transferred From Other Tokens

論文の概要: All for One: LLMs Solve Mental Math at the Last Token With Information Transferred From Other Tokens

arxiv url: http://arxiv.org/abs/2509.09650v1
Date: Thu, 11 Sep 2025 17:41:29 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-12 16:52:24.496626
Title: All for One: LLMs Solve Mental Math at the Last Token With Information Transferred From Other Tokens
Title（参考訳）: LLMがメンタルな数学を解き明かす「All For One」
Authors: Siddarth Mamidanna, Daking Rai, Ziyu Yao, Yilun Zhou,
Abstract要約: 理論的には、因果自己注意層と多層パーセプトロン層の組み合わせにより、全てのトークンが先行する全てのトークンに基づいて情報にアクセスし、計算することができる。初期層における入力固有のトークン計算の抑制、次の数層におけるトークン位置間の情報伝達経路の制限、残りの層における最後のトークンにおける全ての計算を強制する3つのステップについて検討する。
参考スコア（独自算出の注目度）: 14.890542559477906
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) demonstrate proficiency across numerous computational tasks, yet their inner workings remain unclear. In theory, the combination of causal self-attention and multilayer perceptron layers allows every token to access and compute information based on all preceding tokens. In practice, to what extent are such operations present? In this paper, on mental math tasks (i.e., direct math calculation via next-token prediction without explicit reasoning), we investigate this question in three steps: inhibiting input-specific token computations in the initial layers, restricting the routes of information transfer across token positions in the next few layers, and forcing all computation to happen at the last token in the remaining layers. With two proposed techniques, Context-Aware Mean Ablation (CAMA) and Attention-Based Peeking (ABP), we identify an All-for-One subgraph (AF1) with high accuracy on a wide variety of mental math tasks, where meaningful computation occurs very late (in terms of layer depth) and only at the last token, which receives information of other tokens in few specific middle layers. Experiments on a variety of models and arithmetic expressions show that this subgraph is sufficient and necessary for high model performance, transfers across different models, and works on a variety of input styles. Ablations on different CAMA and ABP alternatives reveal their unique advantages over other methods, which may be of independent interest.
Abstract（参考訳）: 大規模言語モデル (LLM) は、多くの計算タスクにまたがる習熟度を示すが、内部の動作は不明確である。理論的には、因果自己注意層と多層パーセプトロン層の組み合わせにより、全てのトークンが先行する全てのトークンに基づいて情報にアクセスし、計算することができる。実際には、そのような操作はどの程度存在するのか? 本稿では,初期層における入力固有のトークン計算の抑制,次の数層におけるトークン位置間の情報伝達の経路の制限,残りの層における最後のトークンにおける全ての計算の強制,という3つのステップで,メンタル数学のタスク(すなわち,次のトーケン予測による直接計算)について検討する。 2つの手法、CAMA (Context-Aware Mean Ablation) とABP (Attention-Based Peeking) を用いて、様々なメンタル数学のタスクにおいて、意味のある計算が非常に遅く(層深度の観点から)、最後のトークンでのみ発生し、いくつかの特定の中層における他のトークンの情報を受け取る、高精度なオール・フォー・ワン・サブグラフ(AF1)を同定する。様々なモデルと算術式の実験により、この部分グラフは高いモデル性能、異なるモデル間での転送に十分で必要なものであり、様々な入力スタイルで動作することが示された。異なるCAMAとAPPの代替品に対するアブレーションは、他の方法よりも独自の利点を示しており、これは独立した関心を持つ可能性がある。

論文の概要: All for One: LLMs Solve Mental Math at the Last Token With Information Transferred From Other Tokens

関連論文リスト