What I Cannot Predict, I Do Not Understand: A Human-Centered Evaluation
Framework for Explainability Methods
- URL: http://arxiv.org/abs/2112.04417v1
- Date: Mon, 6 Dec 2021 18:36:09 GMT
- Title: What I Cannot Predict, I Do Not Understand: A Human-Centered Evaluation
Framework for Explainability Methods
- Authors: Thomas Fel, Julien Colin, Remi Cadene, Thomas Serre
- Abstract summary: We show that theoretical measures used to score explainability methods poorly reflect the practical usefulness of individual attribution methods in real-world scenarios.
Our results suggest a critical need to develop better explainability methods and to deploy human-centered evaluation approaches.
- Score: 6.232071870655069
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A multitude of explainability methods and theoretical evaluation scores have
been proposed. However, it is not yet known: (1) how useful these methods are
in real-world scenarios and (2) how well theoretical measures predict the
usefulness of these methods for practical use by a human. To fill this gap, we
conducted human psychophysics experiments at scale to evaluate the ability of
human participants (n=1,150) to leverage representative attribution methods to
learn to predict the decision of different image classifiers. Our results
demonstrate that theoretical measures used to score explainability methods
poorly reflect the practical usefulness of individual attribution methods in
real-world scenarios. Furthermore, the degree to which individual attribution
methods helped human participants predict classifiers' decisions varied widely
across categorization tasks and datasets.
Overall, our results highlight fundamental challenges for the field --
suggesting a critical need to develop better explainability methods and to
deploy human-centered evaluation approaches. We will make the code of our
framework available to ease the systematic evaluation of novel explainability
methods.
Related papers
- It HAS to be Subjective: Human Annotator Simulation via Zero-shot
Density Estimation [15.8765167340819]
Human annotator simulation (HAS) serves as a cost-effective substitute for human evaluation such as data annotation and system assessment.
Human perception and behaviour during human evaluation exhibit inherent variability due to diverse cognitive processes and subjective interpretations.
This paper introduces a novel meta-learning framework that treats HAS as a zero-shot density estimation problem.
arXiv Detail & Related papers (2023-09-30T20:54:59Z) - Provable Benefits of Policy Learning from Human Preferences in
Contextual Bandit Problems [82.92678837778358]
preference-based methods have demonstrated substantial success in empirical applications such as InstructGPT.
We show how human bias and uncertainty in feedback modelings can affect the theoretical guarantees of these approaches.
arXiv Detail & Related papers (2023-07-24T17:50:24Z) - Better Understanding Differences in Attribution Methods via Systematic Evaluations [57.35035463793008]
Post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions.
We propose three novel evaluation schemes to more reliably measure the faithfulness of those methods.
We use these evaluation schemes to study strengths and shortcomings of some widely used attribution methods over a wide range of models.
arXiv Detail & Related papers (2023-03-21T14:24:58Z) - On The Coherence of Quantitative Evaluation of Visual Explanations [0.7212939068975619]
Evaluation methods have been proposed to assess the "goodness" of visual explanations.
We study a subset of the ImageNet-1k validation set where we evaluate a number of different commonly-used explanation methods.
Results of our study suggest that there is a lack of coherency on the grading provided by some of the considered evaluation methods.
arXiv Detail & Related papers (2023-02-14T13:41:57Z) - Towards Better Understanding Attribution Methods [77.1487219861185]
Post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions.
We propose three novel evaluation schemes to more reliably measure the faithfulness of those methods.
We also propose a post-processing smoothing step that significantly improves the performance of some attribution methods.
arXiv Detail & Related papers (2022-05-20T20:50:17Z) - Discriminative Attribution from Counterfactuals [64.94009515033984]
We present a method for neural network interpretability by combining feature attribution with counterfactual explanations.
We show that this method can be used to quantitatively evaluate the performance of feature attribution methods in an objective manner.
arXiv Detail & Related papers (2021-09-28T00:53:34Z) - On the Interaction of Belief Bias and Explanations [4.211128681972148]
We provide an overview of belief bias, its role in human evaluation, and ideas for NLP practitioners on how to account for it.
We show that conclusions about the highest performing methods change when introducing such controls, pointing to the importance of accounting for belief bias in evaluation.
arXiv Detail & Related papers (2021-06-29T12:49:42Z) - On Sample Based Explanation Methods for NLP:Efficiency, Faithfulness,
and Semantic Evaluation [23.72825603188359]
We can improve the interpretability of explanations by allowing arbitrary text sequences as the explanation unit.
We propose a semantic-based evaluation metric that can better align with humans' judgment of explanations.
arXiv Detail & Related papers (2021-06-09T00:49:56Z) - Nonparametric Estimation of Heterogeneous Treatment Effects: From Theory
to Learning Algorithms [91.3755431537592]
We analyze four broad meta-learning strategies which rely on plug-in estimation and pseudo-outcome regression.
We highlight how this theoretical reasoning can be used to guide principled algorithm design and translate our analyses into practice.
arXiv Detail & Related papers (2021-01-26T17:11:40Z) - On quantitative aspects of model interpretability [0.0]
We argue that methods along these dimensions can be imputed to two conceptual parts, namely the extractor and the actual explainability method.
We experimentally validate our metrics on different benchmark tasks and show how they can be used to guide a practitioner in the selection of the most appropriate method for the task at hand.
arXiv Detail & Related papers (2020-07-15T10:05:05Z) - Interpretable Off-Policy Evaluation in Reinforcement Learning by
Highlighting Influential Transitions [48.91284724066349]
Off-policy evaluation in reinforcement learning offers the chance of using observational data to improve future outcomes in domains such as healthcare and education.
Traditional measures such as confidence intervals may be insufficient due to noise, limited data and confounding.
We develop a method that could serve as a hybrid human-AI system, to enable human experts to analyze the validity of policy evaluation estimates.
arXiv Detail & Related papers (2020-02-10T00:26:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.