Evaluating Explainable AI: Which Algorithmic Explanations Help Users
Predict Model Behavior?
- URL: http://arxiv.org/abs/2005.01831v1
- Date: Mon, 4 May 2020 20:35:17 GMT
- Title: Evaluating Explainable AI: Which Algorithmic Explanations Help Users
Predict Model Behavior?
- Authors: Peter Hase, Mohit Bansal
- Abstract summary: We carry out human subject tests to isolate the effect of algorithmic explanations on model interpretability.
Clear evidence of method effectiveness is found in very few cases.
Our results provide the first reliable and comprehensive estimates of how explanations influence simulatability.
- Score: 97.77183117452235
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Algorithmic approaches to interpreting machine learning models have
proliferated in recent years. We carry out human subject tests that are the
first of their kind to isolate the effect of algorithmic explanations on a key
aspect of model interpretability, simulatability, while avoiding important
confounding experimental factors. A model is simulatable when a person can
predict its behavior on new inputs. Through two kinds of simulation tests
involving text and tabular data, we evaluate five explanations methods: (1)
LIME, (2) Anchor, (3) Decision Boundary, (4) a Prototype model, and (5) a
Composite approach that combines explanations from each method. Clear evidence
of method effectiveness is found in very few cases: LIME improves
simulatability in tabular classification, and our Prototype method is effective
in counterfactual simulation tests. We also collect subjective ratings of
explanations, but we do not find that ratings are predictive of how helpful
explanations are. Our results provide the first reliable and comprehensive
estimates of how explanations influence simulatability across a variety of
explanation methods and data domains. We show that (1) we need to be careful
about the metrics we use to evaluate explanation methods, and (2) there is
significant room for improvement in current methods. All our supporting code,
data, and models are publicly available at:
https://github.com/peterbhase/InterpretableNLP-ACL2020
Related papers
- Interpretability in Symbolic Regression: a benchmark of Explanatory Methods using the Feynman data set [0.0]
Interpretability of machine learning models plays a role as important as the model accuracy.
This paper proposes a benchmark scheme to evaluate explanatory methods to explain regression models.
Results have shown that Symbolic Regression models can be an interesting alternative to white-box and black-box models.
arXiv Detail & Related papers (2024-04-08T23:46:59Z) - Evaluating the Utility of Model Explanations for Model Development [54.23538543168767]
We evaluate whether explanations can improve human decision-making in practical scenarios of machine learning model development.
To our surprise, we did not find evidence of significant improvement on tasks when users were provided with any of the saliency maps.
These findings suggest caution regarding the usefulness and potential for misunderstanding in saliency-based explanations.
arXiv Detail & Related papers (2023-12-10T23:13:23Z) - Right for the Wrong Reason: Can Interpretable ML Techniques Detect
Spurious Correlations? [2.7558542803110244]
We propose a rigorous evaluation strategy to assess an explanation technique's ability to correctly identify spurious correlations.
We find that the post-hoc technique SHAP, as well as the inherently interpretable Attri-Net provide the best performance.
arXiv Detail & Related papers (2023-07-23T14:43:17Z) - MACE: An Efficient Model-Agnostic Framework for Counterfactual
Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE)
In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity.
Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z) - Don't Explain Noise: Robust Counterfactuals for Randomized Ensembles [50.81061839052459]
We formalize the generation of robust counterfactual explanations as a probabilistic problem.
We show the link between the robustness of ensemble models and the robustness of base learners.
Our method achieves high robustness with only a small increase in the distance from counterfactual explanations to their initial observations.
arXiv Detail & Related papers (2022-05-27T17:28:54Z) - Evaluation of Local Model-Agnostic Explanations Using Ground Truth [4.278336455989584]
Explanation techniques are commonly evaluated using human-grounded methods.
We propose a functionally-grounded evaluation procedure for local model-agnostic explanation techniques.
arXiv Detail & Related papers (2021-06-04T13:47:31Z) - Search Methods for Sufficient, Socially-Aligned Feature Importance
Explanations with In-Distribution Counterfactuals [72.00815192668193]
Feature importance (FI) estimates are a popular form of explanation, and they are commonly created and evaluated by computing the change in model confidence caused by removing certain input features at test time.
We study several under-explored dimensions of FI-based explanations, providing conceptual and empirical improvements for this form of explanation.
arXiv Detail & Related papers (2021-06-01T20:36:48Z) - Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial
Explanations of Their Behavior in Natural Language? [86.60613602337246]
We introduce a leakage-adjusted simulatability (LAS) metric for evaluating NL explanations.
LAS measures how well explanations help an observer predict a model's output, while controlling for how explanations can directly leak the output.
We frame explanation generation as a multi-agent game and optimize explanations for simulatability while penalizing label leakage.
arXiv Detail & Related papers (2020-10-08T16:59:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.