Search Methods for Sufficient, Socially-Aligned Feature Importance
Explanations with In-Distribution Counterfactuals
- URL: http://arxiv.org/abs/2106.00786v1
- Date: Tue, 1 Jun 2021 20:36:48 GMT
- Title: Search Methods for Sufficient, Socially-Aligned Feature Importance
Explanations with In-Distribution Counterfactuals
- Authors: Peter Hase, Harry Xie, Mohit Bansal
- Abstract summary: Feature importance (FI) estimates are a popular form of explanation, and they are commonly created and evaluated by computing the change in model confidence caused by removing certain input features at test time.
We study several under-explored dimensions of FI-based explanations, providing conceptual and empirical improvements for this form of explanation.
- Score: 72.00815192668193
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Feature importance (FI) estimates are a popular form of explanation, and they
are commonly created and evaluated by computing the change in model confidence
caused by removing certain input features at test time. For example, in the
standard Sufficiency metric, only the top-k most important tokens are kept. In
this paper, we study several under-explored dimensions of FI-based
explanations, providing conceptual and empirical improvements for this form of
explanation. First, we advance a new argument for why it can be problematic to
remove features from an input when creating or evaluating explanations: the
fact that these counterfactual inputs are out-of-distribution (OOD) to models
implies that the resulting explanations are socially misaligned. The crux of
the problem is that the model prior and random weight initialization influence
the explanations (and explanation metrics) in unintended ways. To resolve this
issue, we propose a simple alteration to the model training process, which
results in more socially aligned explanations and metrics. Second, we compare
among five approaches for removing features from model inputs. We find that
some methods produce more OOD counterfactuals than others, and we make
recommendations for selecting a feature-replacement function. Finally, we
introduce four search-based methods for identifying FI explanations and compare
them to strong baselines, including LIME, Integrated Gradients, and random
search. On experiments with six diverse text classification datasets, we find
that the only method that consistently outperforms random search is a Parallel
Local Search that we introduce. Improvements over the second-best method are as
large as 5.4 points for Sufficiency and 17 points for Comprehensiveness. All
supporting code is publicly available at
https://github.com/peterbhase/ExplanationSearch.
Related papers
- ReCEval: Evaluating Reasoning Chains via Correctness and Informativeness [67.49087159888298]
ReCEval is a framework that evaluates reasoning chains via two key properties: correctness and informativeness.
We show that ReCEval effectively identifies various error types and yields notable improvements compared to prior methods.
arXiv Detail & Related papers (2023-04-21T02:19:06Z) - Explanation Selection Using Unlabeled Data for Chain-of-Thought
Prompting [80.9896041501715]
Explanations that have not been "tuned" for a task, such as off-the-shelf explanations written by nonexperts, may lead to mediocre performance.
This paper tackles the problem of how to optimize explanation-infused prompts in a blackbox fashion.
arXiv Detail & Related papers (2023-02-09T18:02:34Z) - Don't Explain Noise: Robust Counterfactuals for Randomized Ensembles [50.81061839052459]
We formalize the generation of robust counterfactual explanations as a probabilistic problem.
We show the link between the robustness of ensemble models and the robustness of base learners.
Our method achieves high robustness with only a small increase in the distance from counterfactual explanations to their initial observations.
arXiv Detail & Related papers (2022-05-27T17:28:54Z) - Double Trouble: How to not explain a text classifier's decisions using
counterfactuals synthesized by masked language models? [34.18339528128342]
An underlying principle behind dozens of explanation methods is to take the prediction difference between before-and-after an input feature is removed as its attribution.
A recent method called Input Marginalization (IM) uses BERT to replace a token, yielding more plausible counterfactuals.
However, our rigorous evaluation using five metrics and on three datasets found IM explanations to be consistently more biased, less accurate, and less plausible than those derived from simply deleting a word.
arXiv Detail & Related papers (2021-10-22T17:22:05Z) - Contrastive Explanations for Model Interpretability [77.92370750072831]
We propose a methodology to produce contrastive explanations for classification models.
Our method is based on projecting model representation to a latent space.
Our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.
arXiv Detail & Related papers (2021-03-02T00:36:45Z) - Towards Unifying Feature Attribution and Counterfactual Explanations:
Different Means to the Same End [17.226134854746267]
We present a method to generate feature attribution explanations from a set of counterfactual examples.
We show how counterfactual examples can be used to evaluate the goodness of an attribution-based explanation in terms of its necessity and sufficiency.
arXiv Detail & Related papers (2020-11-10T05:41:43Z) - Evaluating Explainable AI: Which Algorithmic Explanations Help Users
Predict Model Behavior? [97.77183117452235]
We carry out human subject tests to isolate the effect of algorithmic explanations on model interpretability.
Clear evidence of method effectiveness is found in very few cases.
Our results provide the first reliable and comprehensive estimates of how explanations influence simulatability.
arXiv Detail & Related papers (2020-05-04T20:35:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.