Fugu-MT 論文翻訳(概要): Search Methods for Sufficient, Socially-Aligned Feature Importance Explanations with In-Distribution Counterfactuals

論文の概要: Search Methods for Sufficient, Socially-Aligned Feature Importance Explanations with In-Distribution Counterfactuals

arxiv url: http://arxiv.org/abs/2106.00786v1
Date: Tue, 1 Jun 2021 20:36:48 GMT
ステータス: 翻訳完了
システム内更新日: 2021-06-03 14:52:15.289284
Title: Search Methods for Sufficient, Socially-Aligned Feature Importance Explanations with In-Distribution Counterfactuals
Title（参考訳）: In-Distribution Counterfactuals を用いた社会適応型特徴重要度記述のための検索手法
Authors: Peter Hase, Harry Xie, Mohit Bansal
Abstract要約: 特徴重要度(FI)推定は一般的な説明形式であり、テスト時に特定の入力特徴を除去することによって生じるモデル信頼度の変化を計算し、評価することが一般的である。 FIに基づく説明の未探索次元についていくつかの考察を行い、この説明形式に対する概念的および実証的な改善を提供する。
参考スコア（独自算出の注目度）: 72.00815192668193
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Feature importance (FI) estimates are a popular form of explanation, and they are commonly created and evaluated by computing the change in model confidence caused by removing certain input features at test time. For example, in the standard Sufficiency metric, only the top-k most important tokens are kept. In this paper, we study several under-explored dimensions of FI-based explanations, providing conceptual and empirical improvements for this form of explanation. First, we advance a new argument for why it can be problematic to remove features from an input when creating or evaluating explanations: the fact that these counterfactual inputs are out-of-distribution (OOD) to models implies that the resulting explanations are socially misaligned. The crux of the problem is that the model prior and random weight initialization influence the explanations (and explanation metrics) in unintended ways. To resolve this issue, we propose a simple alteration to the model training process, which results in more socially aligned explanations and metrics. Second, we compare among five approaches for removing features from model inputs. We find that some methods produce more OOD counterfactuals than others, and we make recommendations for selecting a feature-replacement function. Finally, we introduce four search-based methods for identifying FI explanations and compare them to strong baselines, including LIME, Integrated Gradients, and random search. On experiments with six diverse text classification datasets, we find that the only method that consistently outperforms random search is a Parallel Local Search that we introduce. Improvements over the second-best method are as large as 5.4 points for Sufficiency and 17 points for Comprehensiveness. All supporting code is publicly available at https://github.com/peterbhase/ExplanationSearch.
Abstract（参考訳）: 特徴重要度(FI)推定は一般的な説明形式であり、テスト時に特定の入力特徴を除去することによって生じるモデル信頼度の変化を計算し、評価することが一般的である。例えば、標準sufficiencyメトリックでは、最も重要なトークンはトップkのみ保持される。本稿では,fiベース説明の未検討次元をいくつか検討し,この説明形式に対する概念的および経験的改善について述べる。まず、説明の作成や評価において、なぜインプットから特徴を取り除くことが問題となるのか、という新たな議論を前進させる: モデルに対するこれらの反事実入力がアウト・オブ・ディストリビューション(OOD)であるという事実は、結果として生じる説明が社会的に不一致であることを意味する。問題の本質は、モデル事前化とランダムな重みの初期化が意図しない方法で説明(と説明メトリクス)に影響を与えることである。この問題を解決するために、モデルトレーニングプロセスの簡単な変更を提案し、より社会的に整合した説明とメトリクスをもたらす。第2に,モデル入力から機能を取り除くための5つのアプローチを比較した。いくつかの手法はOOD対策を他の方法よりも多く生成し,機能置換関数を選択することを推奨する。最後に,fi説明を識別し,lime,統合勾配,ランダム検索など,強力なベースラインと比較する検索ベース手法を4つ導入する。 6つの多様なテキスト分類データセットを用いて実験したところ、ランダム検索を一貫して上回る手法は並列局所探索のみであることがわかった。第2の方法による改善は、十分で5.4ポイント、包括性で17ポイントである。サポートコードはすべてhttps://github.com/peterbhase/ExplanationSearchで公開されている。

論文の概要: Search Methods for Sufficient, Socially-Aligned Feature Importance Explanations with In-Distribution Counterfactuals

関連論文リスト