Fugu-MT 論文翻訳(概要): Causal Attribution via Activation Patching

論文の概要: Causal Attribution via Activation Patching

arxiv url: http://arxiv.org/abs/2603.13652v1
Date: Fri, 13 Mar 2026 23:25:49 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-17 16:19:35.316234
Title: Causal Attribution via Activation Patching
Title（参考訳）: 活性化パッチングによる因果関係
Authors: Amirmohammad Izadi, Mohammadali Banayeeanzade, Alireza Mirrokni, Hosein Hasani, Mobin Bagherian, Faridoun Mehri, Mahdieh Soleymani Baghshah,
Abstract要約: 視覚変換器(ViT)のためのアクティベーションパッチング(CAAP)による因果属性を提案する。 CAAPは、内部アクティベーションに直接介入することで、ViTの予測に対する個々のイメージパッチの寄与を推定する。結果の帰属マップは、パッチに関連する内部表現がモデルの予測に因果効果を反映している。
参考スコア（独自算出の注目度）: 11.144828411529495
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Attribution methods for Vision Transformers (ViTs) aim to identify image regions that influence model predictions, but producing faithful and well-localized attributions remains challenging. Existing gradient-based and perturbation-based techniques often fail to isolate the causal contribution of internal representations associated with individual image patches. The key challenge is that class-relevant evidence is formed through interactions between patch tokens across layers, and input-level perturbations can be poor proxies for patch importance, since they may fail to reconstruct the internal evidence actually used by the model. We propose Causal Attribution via Activation Patching (CAAP), which estimates the contribution of individual image patches to the ViT's prediction by directly intervening on internal activations rather than using learned masks or synthetic perturbation patterns. For each patch, CAAP inserts the corresponding source-image activations into a neutral target context over an intermediate range of layers and uses the resulting target-class score as the attribution signal. The resulting attribution map reflects the causal effect of patch-associated internal representations on the model's prediction. The causal intervention serves as a principled measure of patch influence by capturing class-relevant evidence after initial representation formation, while avoiding late-layer global mixing that can reduce spatial specificity. Across multiple ViT backbones and standard metrics, CAAP significantly outperforms existing methods and produces more faithful and localized attributions.
Abstract（参考訳）: 視覚変換器(ViT)の属性法は,モデル予測に影響を与えるイメージ領域を特定することを目的としているが,忠実で局所的な属性を生成することは依然として困難である。既存の勾配に基づく摂動に基づく手法は、個々の画像パッチに関連する内部表現の因果的寄与を分離できないことが多い。重要な課題は、クラス関連エビデンスは層間のパッチトークン間の相互作用によって形成され、入力レベルの摂動はパッチの重要さにとって不十分なプロキシである。本稿では,学習マスクや合成摂動パターンではなく,内部の活性化に直接介入することにより,個々の画像パッチのViT予測への寄与を推定するCausal Attribution via Activation Patching (CAAP)を提案する。パッチ毎に、CAAPは対応するソースイメージのアクティベーションを中間範囲の層上の中立なターゲットコンテキストに挿入し、その結果のターゲットクラススコアを属性信号として使用する。結果の帰属マップは、パッチに関連する内部表現がモデルの予測に因果効果を反映している。因果介入は、初期表現形成後のクラス関連証拠を捕捉し、空間的特異性を低減できる後期のグローバルミキシングを回避し、パッチ影響の原則的尺度として機能する。複数のViTバックボーンと標準メトリクスにわたって、CAAPは既存のメソッドを著しく上回り、より忠実で局所的な属性を生成する。

論文の概要: Causal Attribution via Activation Patching

関連論文リスト