Towards Faithfully Interpretable NLP Systems: How should we define and
evaluate faithfulness?
- URL: http://arxiv.org/abs/2004.03685v3
- Date: Mon, 27 Apr 2020 20:44:37 GMT
- Title: Towards Faithfully Interpretable NLP Systems: How should we define and
evaluate faithfulness?
- Authors: Alon Jacovi, Yoav Goldberg
- Abstract summary: With the growing popularity of deep-learning based NLP models, comes a need for interpretable systems.
What is interpretability, and what constitutes a high-quality interpretation?
We call for more clearly differentiating between different desired criteria an interpretation should satisfy, and focus on the faithfulness criteria.
- Score: 58.13152510843004
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the growing popularity of deep-learning based NLP models, comes a need
for interpretable systems. But what is interpretability, and what constitutes a
high-quality interpretation? In this opinion piece we reflect on the current
state of interpretability evaluation research. We call for more clearly
differentiating between different desired criteria an interpretation should
satisfy, and focus on the faithfulness criteria. We survey the literature with
respect to faithfulness evaluation, and arrange the current approaches around
three assumptions, providing an explicit form to how faithfulness is "defined"
by the community. We provide concrete guidelines on how evaluation of
interpretation methods should and should not be conducted. Finally, we claim
that the current binary definition for faithfulness sets a potentially
unrealistic bar for being considered faithful. We call for discarding the
binary notion of faithfulness in favor of a more graded one, which we believe
will be of greater practical utility.
Related papers
- Evaluating AI Group Fairness: a Fuzzy Logic Perspective [9.876903282565976]
What constitutes group fairness depends on who is asked and the social context, whereas definitions are often relaxed to accept small deviations from the statistical constraints they set out to impose.
Here we decouple definitions of group fairness from the context and from relaxation-related uncertainty by expressing them in the axiomatic system of Basic fuzzy Logic.
We show that commonly held propositions standardize the resulting mathematical formulas and we transcribe logic and truth value choices to layperson terms.
arXiv Detail & Related papers (2024-06-27T07:11:48Z) - Goodhart's Law Applies to NLP's Explanation Benchmarks [57.26445915212884]
We critically examine two sets of metrics: the ERASER metrics (comprehensiveness and sufficiency) and the EVAL-X metrics.
We show that we can inflate a model's comprehensiveness and sufficiency scores dramatically without altering its predictions or explanations on in-distribution test inputs.
Our results raise doubts about the ability of current metrics to guide explainability research, underscoring the need for a broader reassessment of what precisely these metrics are intended to capture.
arXiv Detail & Related papers (2023-08-28T03:03:03Z) - RankCSE: Unsupervised Sentence Representations Learning via Learning to
Rank [54.854714257687334]
We propose a novel approach, RankCSE, for unsupervised sentence representation learning.
It incorporates ranking consistency and ranking distillation with contrastive learning into a unified framework.
An extensive set of experiments are conducted on both semantic textual similarity (STS) and transfer (TR) tasks.
arXiv Detail & Related papers (2023-05-26T08:27:07Z) - Logical Satisfiability of Counterfactuals for Faithful Explanations in
NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals.
It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation.
It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z) - Towards a Theory of Faithfulness: Faithful Explanations of
Differentiable Classifiers over Continuous Data [17.9926469947157]
We propose two formal definitions of faithfulness for feature attribution methods.
qualitative faithfulness demands that scores reflect the true qualitative effect (positive vs. negative) of the feature on the model.
We experimentally demonstrate that popular attribution methods can fail to give faithful explanations in the setting where the data is continuous.
arXiv Detail & Related papers (2022-05-19T15:34:21Z) - Counterfactual Evaluation for Explainable AI [21.055319253405603]
We propose a new methodology to evaluate the faithfulness of explanations from the textitcounterfactual reasoning perspective.
We introduce two algorithms to find the proper counterfactuals in both discrete and continuous scenarios and then use the acquired counterfactuals to measure faithfulness.
arXiv Detail & Related papers (2021-09-05T01:38:49Z) - On the Faithfulness Measurements for Model Interpretations [100.2730234575114]
Post-hoc interpretations aim to uncover how natural language processing (NLP) models make predictions.
To tackle these issues, we start with three criteria: the removal-based criterion, the sensitivity of interpretations, and the stability of interpretations.
Motivated by the desideratum of these faithfulness notions, we introduce a new class of interpretation methods that adopt techniques from the adversarial domain.
arXiv Detail & Related papers (2021-04-18T09:19:44Z) - Are Interpretations Fairly Evaluated? A Definition Driven Pipeline for
Post-Hoc Interpretability [54.85658598523915]
We propose to have a concrete definition of interpretation before we could evaluate faithfulness of an interpretation.
We find that although interpretation methods perform differently under a certain evaluation metric, such a difference may not result from interpretation quality or faithfulness.
arXiv Detail & Related papers (2020-09-16T06:38:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.