Fugu-MT 論文翻訳(概要): Training for Trustworthy Saliency Maps: Adversarial Training Meets Feature-Map Smoothing

論文の概要: Training for Trustworthy Saliency Maps: Adversarial Training Meets Feature-Map Smoothing

arxiv url: http://arxiv.org/abs/2603.07302v1
Date: Sat, 07 Mar 2026 18:00:05 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-10 15:13:14.216456
Title: Training for Trustworthy Saliency Maps: Adversarial Training Meets Feature-Map Smoothing
Title（参考訳）: 信頼できるサリエンシマップのためのトレーニング: 敵対的なトレーニングがフィーチャーマップの平滑化と出会う
Authors: Dipkamal Bhusal, Md Tanvirul Alam, Nidhi Rastogi,
Abstract要約: 中間層に微分可能なガウスフィルタを適用する軽量な特徴写像平滑化ブロックを提案する。 FMNIST, CIFAR-10, ImageNette全体では, 入力側安定性と出力側安定性を両立させながら, 対向訓練の空間的利点を保ちながら保存する。
参考スコア（独自算出の注目度）: 4.014524824655106
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Gradient-based saliency methods such as Vanilla Gradient (VG) and Integrated Gradients (IG) are widely used to explain image classifiers, yet the resulting maps are often noisy and unstable, limiting their usefulness in high-stakes settings. Most prior work improves explanations by modifying the attribution algorithm, leaving open how the training procedure shapes explanation quality. We take a training-centered view and first provide a curvature-based analysis linking attribution stability to how smoothly the input-gradient field varies locally. Guided by this connection, we study adversarial training and identify a consistent trade-off: it yields sparser and more input-stable saliency maps, but can degrade output-side stability, causing explanations to change even when predictions remain unchanged and logits vary only slightly. To mitigate this, we propose augmenting adversarial training with a lightweight feature-map smoothing block that applies a differentiable Gaussian filter in an intermediate layer. Across FMNIST, CIFAR-10, and ImageNette, our method preserves the sparsity benefits of adversarial training while improving both input-side stability and output-side stability. A human study with 65 participants further shows that smoothed adversarial saliency maps are perceived as more sufficient and trustworthy. Overall, our results demonstrate that explanation quality is critically shaped by training, and that simple smoothing with robust training provides a practical path toward saliency maps that are both sparse and stable.
Abstract（参考訳）: Vanilla Gradient (VG) やIntegrated Gradients (IG) といった勾配法は画像分類法を説明するために広く用いられているが、結果として得られるマップはうるさくて不安定であり、高い視点での有用性を制限している。これまでのほとんどの作業は、属性アルゴリズムを変更して説明を改善し、トレーニング手順が説明の質をどのように形成するかをオープンにする。学習中心の視点で、まず、帰属安定性と入力勾配場が局所的にいかに滑らかであるかをリンクする曲率に基づく解析を行う。この接続によって、敵の訓練を研究し、一貫したトレードオフを識別する:スペーサーとより入力安定なサリエンシマップを生成するが、出力側安定性を低下させ、予測が変化せず、ロジットがわずかに変化しても説明が変わる。これを軽減するために,中間層に微分可能なガウスフィルタを適用した軽量な特徴マップ平滑化ブロックによる対向トレーニングの強化を提案する。 FMNIST, CIFAR-10, ImageNette全体では, 入力側安定性と出力側安定性を両立させながら, 対向訓練の空間的利点を保ちながら保存する。 65人の被験者による人間による研究により、スムーズな対逆サリエンシマップはより十分で信頼性の高いものと見なされることが明らかとなった。以上の結果から, 説明の質はトレーニングによって決定的に形成され, 頑健なトレーニングによる簡易な平滑化は, スパースかつ安定なサリエンシマップへの実践的な道筋となることが示唆された。

論文の概要: Training for Trustworthy Saliency Maps: Adversarial Training Meets Feature-Map Smoothing

関連論文リスト