Are Interpretations Fairly Evaluated? A Definition Driven Pipeline for
Post-Hoc Interpretability
- URL: http://arxiv.org/abs/2009.07494v1
- Date: Wed, 16 Sep 2020 06:38:03 GMT
- Title: Are Interpretations Fairly Evaluated? A Definition Driven Pipeline for
Post-Hoc Interpretability
- Authors: Ninghao Liu, Yunsong Meng, Xia Hu, Tie Wang, Bo Long
- Abstract summary: We propose to have a concrete definition of interpretation before we could evaluate faithfulness of an interpretation.
We find that although interpretation methods perform differently under a certain evaluation metric, such a difference may not result from interpretation quality or faithfulness.
- Score: 54.85658598523915
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent years have witnessed an increasing number of interpretation methods
being developed for improving transparency of NLP models. Meanwhile,
researchers also try to answer the question that whether the obtained
interpretation is faithful in explaining mechanisms behind model prediction?
Specifically, (Jain and Wallace, 2019) proposes that "attention is not
explanation" by comparing attention interpretation with gradient alternatives.
However, it raises a new question that can we safely pick one interpretation
method as the ground-truth? If not, on what basis can we compare different
interpretation methods? In this work, we propose that it is crucial to have a
concrete definition of interpretation before we could evaluate faithfulness of
an interpretation. The definition will affect both the algorithm to obtain
interpretation and, more importantly, the metric used in evaluation. Through
both theoretical and experimental analysis, we find that although
interpretation methods perform differently under a certain evaluation metric,
such a difference may not result from interpretation quality or faithfulness,
but rather the inherent bias of the evaluation metric.
Related papers
- Counterfactuals of Counterfactuals: a back-translation-inspired approach
to analyse counterfactual editors [3.4253416336476246]
We focus on the analysis of counterfactual, contrastive explanations.
We propose a new back translation-inspired evaluation methodology.
We show that by iteratively feeding the counterfactual to the explainer we can obtain valuable insights into the behaviour of both the predictor and the explainer models.
arXiv Detail & Related papers (2023-05-26T16:04:28Z) - Human Interpretation of Saliency-based Explanation Over Text [65.29015910991261]
We study saliency-based explanations over textual data.
We find that people often mis-interpret the explanations.
We propose a method to adjust saliencies based on model estimates of over- and under-perception.
arXiv Detail & Related papers (2022-01-27T15:20:32Z) - Counterfactual Evaluation for Explainable AI [21.055319253405603]
We propose a new methodology to evaluate the faithfulness of explanations from the textitcounterfactual reasoning perspective.
We introduce two algorithms to find the proper counterfactuals in both discrete and continuous scenarios and then use the acquired counterfactuals to measure faithfulness.
arXiv Detail & Related papers (2021-09-05T01:38:49Z) - On the Faithfulness Measurements for Model Interpretations [100.2730234575114]
Post-hoc interpretations aim to uncover how natural language processing (NLP) models make predictions.
To tackle these issues, we start with three criteria: the removal-based criterion, the sensitivity of interpretations, and the stability of interpretations.
Motivated by the desideratum of these faithfulness notions, we introduce a new class of interpretation methods that adopt techniques from the adversarial domain.
arXiv Detail & Related papers (2021-04-18T09:19:44Z) - Where and What? Examining Interpretable Disentangled Representations [96.32813624341833]
Capturing interpretable variations has long been one of the goals in disentanglement learning.
Unlike the independence assumption, interpretability has rarely been exploited to encourage disentanglement in the unsupervised setting.
In this paper, we examine the interpretability of disentangled representations by investigating two questions: where to be interpreted and what to be interpreted.
arXiv Detail & Related papers (2021-04-07T11:22:02Z) - Interpretable Deep Learning: Interpretations, Interpretability,
Trustworthiness, and Beyond [49.93153180169685]
We introduce and clarify two basic concepts-interpretations and interpretability-that people usually get confused.
We elaborate the design of several recent interpretation algorithms, from different perspectives, through proposing a new taxonomy.
We summarize the existing work in evaluating models' interpretability using "trustworthy" interpretation algorithms.
arXiv Detail & Related papers (2021-03-19T08:40:30Z) - Enforcing Interpretability and its Statistical Impacts: Trade-offs
between Accuracy and Interpretability [30.501012698482423]
There has been no formal study of the statistical cost of interpretability in machine learning.
We model the act of enforcing interpretability as that of performing empirical risk minimization over the set of interpretable hypotheses.
We perform a case analysis, explaining why one may or may not observe a trade-off between accuracy and interpretability when the restriction to interpretable classifiers does or does not come at the cost of some excess statistical risk.
arXiv Detail & Related papers (2020-10-26T17:52:34Z) - Towards Faithfully Interpretable NLP Systems: How should we define and
evaluate faithfulness? [58.13152510843004]
With the growing popularity of deep-learning based NLP models, comes a need for interpretable systems.
What is interpretability, and what constitutes a high-quality interpretation?
We call for more clearly differentiating between different desired criteria an interpretation should satisfy, and focus on the faithfulness criteria.
arXiv Detail & Related papers (2020-04-07T20:15:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.