Fugu-MT 論文翻訳(概要): Pattern Selectivity is Not Task-Causal Structure: A Cross-Architecture Mechanistic Study of Composed-Task Circuits in 1B-Class Language Models

論文の概要: Pattern Selectivity is Not Task-Causal Structure: A Cross-Architecture Mechanistic Study of Composed-Task Circuits in 1B-Class Language Models

arxiv url: http://arxiv.org/abs/2606.05378v1
Date: Wed, 03 Jun 2026 19:27:07 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-05 22:39:44.362763
Title: Pattern Selectivity is Not Task-Causal Structure: A Cross-Architecture Mechanistic Study of Composed-Task Circuits in 1B-Class Language Models
Title（参考訳）: パターン選択性はタスクカジュアル構造ではない:1Bクラス言語モデルにおける構成タスク回路のクロスアーキテクチャ力学的研究
Authors: Yongzhong Xu,
Abstract要約: 一致したランダムなヌルをセル毎に10個のシードでサンプリングした統一プロトコルを実行します。結果として生じる12個の細胞(タスク、モデル)は、同じ一次因果スクリーンを同じ効果サイズで共有する2つの細胞を含まない。パネル内のMoEモデルは、基礎となる先進的な位置決め基板の上に合成タスク回路を構築する。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We test whether a single screen-and-ablate recipe -- identify attention-head circuits by task-pattern selectivity, then verify by causal ablation against a matched-random null -- produces consistent mechanistic claims across model families. The recipe ports across pipelines; the specific circuit it identifies does not. Across four composed tasks (indirect-object identification, greater-than, successor sequences, variable binding) and three 1B-class language models from distinct training pipelines (Pythia 1B / Pile / dense; OLMo 1B / DCLM / dense; OLMoE 1B-7B / DCLM / mixture-of-experts), we run a unified protocol with the matched-random null sampled across ten seeds per cell. The resulting 12 (task, model) cells contain no two that share the same primary causal screen at comparable effect size: the same task, with the same behavioral capability, is implemented through different attention-pattern types across models. We introduce a five-category screen-outcome taxonomy -- primary cause, secondary cause, correlate, interferer, null -- with quantitative thresholds, and show that all five outcomes appear in the panel. We propose a falsifiable hypothesis: the MoE model in our panel builds composed-task circuits on top of a foundational previous-token positional substrate (the prev-token-circuit ablation is the strongest causal screen on 3 of 4 tasks for OLMoE 1B-7B), with the IOI exception consistent with IOI being a final-position name-copying task whose structure directly probes a different pattern. The hypothesis comes with explicit predictions for other MoE language models. We frame the methodology honestly: the spectral participation-ratio signal from the companion methodology paper is a general indicator of specialized computation; what makes a finding task-specific is the task-pattern screen plus a per-model causal verification.
Abstract（参考訳）: 一つのスクリーン・アンド・アブレーションのレシピ -- タスク・パターンの選択によってアテンション・ヘッドの回路を識別し、マッチしたランダムなヌルに対して因果的アブレーションによって検証する -- が、モデルファミリ間で一貫したメカニスティックなクレームを生成するかどうかをテストする。レシピはパイプラインにまたがってポートされる。異なる訓練パイプライン(Pythia 1B / Pile / dense; OLMo1B / DCLM / dense; OLMoE 1B-7B / DCLM / Mix-of-experts)から合成された4つのタスク(間接オブジェクト識別、大域的、後続配列、変数結合)と3つの1Bクラス言語モデル(Pythia 1B / Pile / dense; OLMo1B / DCLM / dense; OLMoE 1B-7B / DCLM / Mixed-of-experts)を用いて、一致したランダムをセル毎に10種にわたってサンプリングした統一プロトコルを実行する。結果として得られた12個の(タスク、モデル)セルは、同じ一次因果画面を同等の効果サイズで共有する2つの画面を含まない:同じ振る舞い能力を持つ同じタスクは、モデル間で異なる注意パターンタイプによって実装される。我々は,5つのカテゴリーのスクリーンアウトカム分類(主原因,二次原因,相関要因,干渉要因,無効)を定量的な閾値で導入し,すべての5つの結果がパネルに現れることを示す。提案するフェーザビリティ仮説は,本パネルのMoEモデルが,基礎となる前兆位置基板上に構成タスク回路(OLMoE 1B-7Bの4つのタスクのうち3つのタスクのうち,プリブ・トケン回路のアブレーションは最強の因果スクリーン)を構築し,IOI例外は,構造が直接異なるパターンを探索する最終位置の名前コピータスクであるIOIと一致している。この仮説は、他のMoE言語モデルに対する明確な予測を伴っている。提案手法は,本論文のコンパニオン・コンパニオン・レポーション・レポーション・シグナルを特殊計算の一般的な指標とし,タスク・パターン・スクリーンとモデルごとの因果検証を行う。

論文の概要: Pattern Selectivity is Not Task-Causal Structure: A Cross-Architecture Mechanistic Study of Composed-Task Circuits in 1B-Class Language Models

関連論文リスト