Fugu-MT 論文翻訳(概要): The Attribution Contract: Feature Attribution for Generative Language Models

論文の概要: The Attribution Contract: Feature Attribution for Generative Language Models

arxiv url: http://arxiv.org/abs/2605.23080v1
Date: Thu, 21 May 2026 22:27:04 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-25 17:29:20.123151
Title: The Attribution Contract: Feature Attribution for Generative Language Models
Title（参考訳）: 属性契約:生成言語モデルの特徴属性
Authors: Giang Nguyen,
Abstract要約: 特徴属性法は、モデル出力にどの入力特徴が重要かを特定することを約束する。生成言語モデルにおいて、そもそもどのような機能を機能とみなすべきかは、しばしば不明である。私たちは属性契約(Attribution Contract)を紹介します。
参考スコア（独自算出の注目度）: 1.6001421987996292
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Feature attribution methods promise to identify which input features matter for a model output. In generative language models, however, it is often unclear what should count as a feature in the first place. In autoregressive language models, earlier generated tokens are both outputs of the model and inputs to later predictions. In diffusion language models, generation proceeds through iterative denoising or unmasking rather than fixed left-to-right prediction, so local explanation may target a state of diffusion rather than a next token. We argue that this ambiguity is not merely an implementation detail, but a conceptual limitation of carrying classifier-era feature attribution directly into generative language modeling. We introduce the Attribution Contract, a specification for feature-attribution claims that names what output is being explained, which features are eligible to receive attribution, what generative process is assumed, what is held fixed, and what model score is being attributed. The contract clarifies why the same attribution method can answer different questions depending on how it is instantiated. We argue that many disagreements about feature attribution in generative language models are not disagreements about attribution algorithms, but about unstated explanatory contracts. Using autoregressive and diffusion language models as case studies, we show when attribution to earlier generated tokens, intermediate states, or denoising stages is informative, when it is misleading, and why feature-attribution methods in generative language models should be evaluated as method-contract pairs.
Abstract（参考訳）: 特徴属性法は、モデル出力にどの入力特徴が重要かを特定することを約束する。しかし、生成言語モデルでは、そもそもどのような機能を特徴とみなすべきかはよく分かっていない。自己回帰言語モデルでは、初期のトークンはモデルの出力であり、後の予測への入力である。拡散言語モデルでは、生成は固定された左から右への予測よりも反復的なデノベーションやアンマスキーによって進行するので、局所的な説明は次のトークンではなく拡散状態をターゲットにすることができる。このあいまいさは単に実装の細部ではなく、生成言語モデリングに直接分類器による特徴属性を運ぶという概念的な制限である、と我々は主張する。我々は、属性契約(Attribution Contract)を導入する。これは、どの出力が説明されているか、どの特徴が属性を受けられるか、生成プロセスが想定されるか、何が固定され、どのモデルスコアが属性であるか、という主張である。契約は、どのようにインスタンス化されるかによって、なぜ同じ属性メソッドが異なる質問に答えられるのかを明確にする。生成言語モデルにおける特徴帰属に関する多くの意見の相違は、帰属アルゴリズムに対する意見の相違ではなく、未定の説明契約に関する意見の相違である。自己回帰言語モデルと拡散言語モデルを用いて,先行生成したトークン,中間状態,あるいは認知段階への帰属が情報的である場合,それが誤解を招く場合,および生成言語モデルにおける特徴帰属手法がメソッド・コントラクション・ペアとして評価されるべき理由を示す。

論文の概要: The Attribution Contract: Feature Attribution for Generative Language Models

関連論文リスト