Fugu-MT 論文翻訳(概要): The Right Answer, the Wrong Direction: Why Transformers Fail at Counting and How to Fix It

論文の概要: The Right Answer, the Wrong Direction: Why Transformers Fail at Counting and How to Fix It

arxiv url: http://arxiv.org/abs/2605.03258v1
Date: Tue, 05 May 2026 01:13:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-06 19:35:43.701291
Title: The Right Answer, the Wrong Direction: Why Transformers Fail at Counting and How to Fix It
Title（参考訳）: 正しい答えと間違った方向:なぜトランスフォーマーはカウントに失敗し、どのように修正するか
Authors: Gabriel Garcia,
Abstract要約: 大規模言語モデルは、たとえカウントすべきアイテムがプロンプトに明示的に存在しているとしても、単純なカウントタスクで失敗することが多い。この失敗は、変換器が内部的にカウントを表現しないためなのか、あるいはそれらの表現を正しい出力トークンに変換することができないためなのかを考察する。線形プローブは、中間層からほぼ完全な精度で正しいカウントを復元する。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models often fail at simple counting tasks, even when the items to count are explicitly present in the prompt. We investigate whether this failure occurs because transformers do not represent counts internally, or because they cannot convert those representations into the correct output tokens. Across three model families, Pythia, Qwen3, and Mistral, ranging from 0.4B to 14B parameters, we find strong evidence for the second explanation. Linear probes recover the correct count from intermediate layers with near-perfect accuracy ($R^2>0.99$), showing that the information is present. However, the internal directions that encode counts are nearly orthogonal to the output-head rows for digit tokens ($|\cos|\leq0.032$). In other words, the model stores the count in a form that the digit logits do not naturally read out. We localize this failure with two interventions. Updating only the digit rows of the output head (36,864 parameters) substantially improves constrained next-token digit prediction (60.7 to 100.0% across four tasks), but it does not fix autoregressive generation. By contrast, a small LoRA intervention on attention Q/V weights (7.67M parameters) improves upstream routing and achieves 83.1% +/- 7.2% in true greedy autoregressive generation. Logit-lens measurements confirm the mechanism: the correct digit's vocabulary rank drops from 55,980 to 1, a 50,000x improvement. Additional norm, logit-lens, and cross-task analyses show that the bottleneck generalizes across character counting, addition, and list length, while remaining absent from broader multi-step reasoning benchmarks, including MMLU, GSM8K, and DROP. These results identify counting failure as a geometric readout bottleneck rather than a failure of internal representation: the model knows the count but the output pathway is geometrically misaligned with the tokens needed to express it.
Abstract（参考訳）: 大規模言語モデルは、たとえカウントすべきアイテムがプロンプトに明示的に存在しているとしても、単純なカウントタスクで失敗することが多い。この失敗は、変換器が内部的にカウントを表現しないためなのか、あるいはそれらの表現を正しい出力トークンに変換することができないためなのかを考察する。ピキア、Qwen3、Mistralの3つのモデル族は0.4Bから14Bの範囲で、第2の説明の強い証拠を見出す。線形プローブは、中間層からほぼ完全な精度(R^2>0.99$)で正しいカウントを復元し、情報が存在することを示す。しかし、カウントを符号化する内部方向は、桁のトークン(|\cos|\leq0.032$)の出力ヘッド行とほぼ直交する。言い換えれば、このモデルは数値を数字のロジットが自然に読み出さない形で保存する。この失敗を2つの介入で局所化する。出力ヘッドの桁行(36,864パラメータ)のみの更新は、制約付き次トーケンの桁予測(4つのタスクで60.7から100.0%)を大幅に改善するが、自己回帰生成を修正しない。対照的に、注意Q/V重みに対する小さなLoRA介入(7.67Mパラメータ)は上流のルーティングを改善し、真の強欲な自己回帰生成において83.1%+/-7.2%を達成する。正しい数字の語彙ランクは55,980から1,50,000倍に低下する。追加のノルム、ロジットレンズ、クロスタスク分析は、ボトルネックが文字カウント、加算、リスト長をまたいで一般化する一方で、MMLU、GSM8K、DROPを含むより広範な多段階推論ベンチマークは欠如していることを示している。これらの結果は、カウントの失敗を内部表現の失敗よりも幾何学的読み出しボトルネックとみなす:モデルはカウントを知っているが、出力経路はそれを表現するのに必要なトークンと幾何学的に不一致である。

論文の概要: The Right Answer, the Wrong Direction: Why Transformers Fail at Counting and How to Fix It

関連論文リスト