Fugu-MT 論文翻訳(概要): Decision-Aware Attention Propagation for Vision Transformer Explainability

論文の概要: Decision-Aware Attention Propagation for Vision Transformer Explainability

arxiv url: http://arxiv.org/abs/2604.18094v1
Date: Mon, 20 Apr 2026 11:10:18 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-21 21:52:52.827042
Title: Decision-Aware Attention Propagation for Vision Transformer Explainability
Title（参考訳）: 視覚変換器説明可能性のための決定型注意伝播
Authors: Sehyeong Jo, Gangjae Jang, Haesol Park,
Abstract要約: DAP(Decision-Aware Attention Propagation)は、意思決定に関連する事前情報をトランスフォーマーの注意伝達に注入する属性法である。 DAPは従来の注意に基づく手法よりもクラス敏感でコンパクトで忠実な帰属写像を生成する。
参考スコア（独自算出の注目度）: 1.9116784879310027
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision Transformers (ViTs) have become a dominant architecture in computer vision, yet their prediction process remains difficult to interpret because information is propagated through complex interactions across layers and attention heads. Existing attention based explanation methods provide an intuitive way to trace information flow. However, they rely mainly on raw attention weights, which do not explicitly reflect the final decision and often lead to explanations with limited class discriminability. In contrast, gradient based localization methods are more effective at highlighting class specific evidence, but they do not fully exploit the hierarchical attention propagation mechanism of transformers. To address this limitation, we propose Decision-Aware Attention Propagation (DAP), an attribution method that injects decision-relevant priors into transformer attention propagation. By estimating token importance through gradient based localization and integrating it into layer wise attention rollout, the method captures both the structural flow of attention and the evidence most relevant to the final prediction. Consequently, DAP produces attribution maps that are more class sensitive, compact, and faithful than those generated by conventional attention based methods. Extensive experiments across Vision Transformer variants of different model scales show that DAP consistently outperforms existing baselines in both quantitative metrics and qualitative visualizations, indicating that decision aware propagation is an effective direction for improving ViT interpretability.
Abstract（参考訳）: 視覚変換器(ViT)はコンピュータビジョンにおいて支配的なアーキテクチャとなっているが、情報層や注目ヘッド間の複雑な相互作用によって伝達されるため、その予測プロセスの解釈は困難である。既存の注意に基づく説明手法は、情報の流れを追跡する直感的な方法を提供する。しかし、それらは主に生の注意重みに依存しており、これは最終的な決定を明示的に反映せず、しばしば階級差別性に制限された説明につながる。対照的に、勾配に基づく局所化法は、クラス固有の証拠を強調するのに効果的であるが、それらはトランスの階層的注意伝播機構を完全に活用していない。この制限に対処するために、意思決定関連先行情報をトランスフォーマーの注意伝達に注入する帰属法であるDAP(Decision-Aware Attention Propagation)を提案する。勾配に基づく局所化を通じてトークンの重要度を推定し,それを階層的注意ロールアウトに統合することにより,注目の構造フローと最終予測に最も関係のある証拠の両方をキャプチャする。その結果、DAPは従来の注意に基づく手法よりもクラス敏感でコンパクトで忠実な帰属写像を生成する。異なるモデルスケールのVision Transformer変種にわたる広範囲な実験により、DAPは定量的メトリクスと定性的視覚化の両方において既存のベースラインを一貫して上回り、意思決定意識の伝播がViT解釈性を改善する効果的な方向であることを示している。

論文の概要: Decision-Aware Attention Propagation for Vision Transformer Explainability

関連論文リスト