Fugu-MT 論文翻訳(概要): Represented Is Not Computed: A Causal Test of Candidate Algorithmic Intermediates in a Transformer

論文の概要: Represented Is Not Computed: A Causal Test of Candidate Algorithmic Intermediates in a Transformer

arxiv url: http://arxiv.org/abs/2605.22488v1
Date: Thu, 21 May 2026 13:43:25 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-22 16:35:42.280535
Title: Represented Is Not Computed: A Causal Test of Candidate Algorithmic Intermediates in a Transformer
Title（参考訳）: Represented is not Computed: a Causal Test of Candidate Algorithmic Intermediates in a Transformer
Authors: Ishita Darade, Sushrut Thorat,
Abstract要約: ベース桁抽出に基づいて訓練されたトランスフォーマーは、ホールドアウトされたナンバーベース交差点で99.83%の精度に達する。スパースサーキットサーチは、プローブが提案するルートよりも遅くに結合するN$、B$、D$ルートを主に分離している。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Structured prompts require integrating components according to task-relevant relations. How a network implements this integration is often hard to judge in language or vision, where those relations are rarely specified precisely enough to define a candidate internal algorithm. Arithmetic offers a cleaner setting. We study a Transformer trained on base-digit extraction: given $N$, $B$, and $D$, it must report the coefficient of $B^D$ in the base-$B$ expansion of $N$. The closed-form solution, $\lfloor N/B^D \rfloor \bmod B$, provides explicit candidate algorithmic intermediates. Across three seeds, the model reaches 99.83% exact-answer accuracy on held-out number-base intersections, establishing reliable task competence. Linear probes decode the intermediates, making staged arithmetic computation plausible. Causal tests then separate representation from use: within the localized route from the stream with $D$ as input to the output positions, behavior depends on early $D$-selective communication, independent of $N$ and $B$. Relatedly, a sparse circuit search finds mostly separate $N$, $B$, and $D$ routes that combine late rather than the staged route suggested by the probes. Thus, the model represents the intermediates that make the closed-form solution plausible, but the identified localized causal route does not transmit them to the output stream. This case shows that probe-based conclusions can diverge sharply from causal observations, even when explicit algorithmic hypotheses are available.
Abstract（参考訳）: 構造化プロンプトは、タスク関連の関係に応じてコンポーネントを統合する必要がある。この統合をネットワークがどのように実装するかは、言語や視覚において判断することが難しい場合が多いが、それらの関係は、候補となる内部アルゴリズムを定義するのに十分な正確さで特定されることは滅多にない。算術はよりクリーンな設定を提供する。ベース桁抽出に基づいて訓練された変換器について検討する:$N$,$B$,$D$に対して、ベース桁の$B^D$の係数を報告しなければならない。閉形式解、$\lfloor N/B^D \rfloor \bmod B$ は明確な候補アルゴリズム中間体を提供する。 3つの種にまたがって、モデルの精度は99.83%に達し、信頼性の高いタスク能力を確立する。線形プローブは中間体をデコードし、段階演算計算が可能である。出力位置への入力として$D$でストリームからローカライズされたルート内では、振る舞いは早期の$D$選択通信に依存し、$N$と$B$とは独立している。関連して、スパースサーキットサーチは、プローブが提案する段階的なルートではなく、遅くに結合する$N$、$B$、および$D$ルートをほとんど別々に見つける。このように、モデルは閉形式解を可塑性にする中間体を表すが、同定された局所因果経路は出力ストリームにそれらを伝達しない。このケースでは、明示的なアルゴリズム仮説が利用できる場合でも、プローブに基づく結論は因果的な観測から急激に発散する可能性がある。

論文の概要: Represented Is Not Computed: A Causal Test of Candidate Algorithmic Intermediates in a Transformer

関連論文リスト