Faithfulness Tests for Natural Language Explanations
- URL: http://arxiv.org/abs/2305.18029v2
- Date: Fri, 30 Jun 2023 15:41:47 GMT
- Title: Faithfulness Tests for Natural Language Explanations
- Authors: Pepa Atanasova, Oana-Maria Camburu, Christina Lioma, Thomas
Lukasiewicz, Jakob Grue Simonsen, Isabelle Augenstein
- Abstract summary: Explanations of neural models aim to reveal a model's decision-making process for its predictions.
Recent work shows that current methods giving explanations such as saliency maps or counterfactuals can be misleading.
This work explores the challenging question of evaluating the faithfulness of natural language explanations.
- Score: 87.01093277918599
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Explanations of neural models aim to reveal a model's decision-making process
for its predictions. However, recent work shows that current methods giving
explanations such as saliency maps or counterfactuals can be misleading, as
they are prone to present reasons that are unfaithful to the model's inner
workings. This work explores the challenging question of evaluating the
faithfulness of natural language explanations (NLEs). To this end, we present
two tests. First, we propose a counterfactual input editor for inserting
reasons that lead to counterfactual predictions but are not reflected by the
NLEs. Second, we reconstruct inputs from the reasons stated in the generated
NLEs and check how often they lead to the same predictions. Our tests can
evaluate emerging NLE models, proving a fundamental tool in the development of
faithful NLEs.
Related papers
- XForecast: Evaluating Natural Language Explanations for Time Series Forecasting [72.57427992446698]
Time series forecasting aids decision-making, especially for stakeholders who rely on accurate predictions.
Traditional explainable AI (XAI) methods, which underline feature or temporal importance, often require expert knowledge.
evaluating forecast NLEs is difficult due to the complex causal relationships in time series data.
arXiv Detail & Related papers (2024-10-18T05:16:39Z) - Evaluating the Reliability of Self-Explanations in Large Language Models [2.8894038270224867]
We evaluate two kinds of such self-explanations - extractive and counterfactual.
Our findings reveal, that, while these self-explanations can correlate with human judgement, they do not fully and accurately follow the model's decision process.
We show that this gap can be bridged because prompting LLMs for counterfactual explanations can produce faithful, informative, and easy-to-verify results.
arXiv Detail & Related papers (2024-07-19T17:41:08Z) - KNOW How to Make Up Your Mind! Adversarially Detecting and Alleviating
Inconsistencies in Natural Language Explanations [52.33256203018764]
We leverage external knowledge bases to significantly improve on an existing adversarial attack for detecting inconsistent NLEs.
We show that models with higher NLE quality do not necessarily generate fewer inconsistencies.
arXiv Detail & Related papers (2023-06-05T15:51:58Z) - Counterfactuals of Counterfactuals: a back-translation-inspired approach
to analyse counterfactual editors [3.4253416336476246]
We focus on the analysis of counterfactual, contrastive explanations.
We propose a new back translation-inspired evaluation methodology.
We show that by iteratively feeding the counterfactual to the explainer we can obtain valuable insights into the behaviour of both the predictor and the explainer models.
arXiv Detail & Related papers (2023-05-26T16:04:28Z) - Benchmarking Faithfulness: Towards Accurate Natural Language
Explanations in Vision-Language Tasks [0.0]
Natural language explanations (NLEs) promise to enable the communication of a model's decision-making in an easily intelligible way.
While current models successfully generate convincing explanations, it is an open question how well the NLEs actually represent the reasoning process of the models.
We propose three faithfulness metrics: Attribution-Similarity, NLE-Sufficiency, and NLE-Comprehensiveness.
arXiv Detail & Related papers (2023-04-03T08:24:10Z) - Towards Faithful Model Explanation in NLP: A Survey [48.690624266879155]
End-to-end neural Natural Language Processing (NLP) models are notoriously difficult to understand.
One desideratum of model explanation is faithfulness, i.e. an explanation should accurately represent the reasoning process behind the model's prediction.
We review over 110 model explanation methods in NLP through the lens of faithfulness.
arXiv Detail & Related papers (2022-09-22T21:40:51Z) - Logical Satisfiability of Counterfactuals for Faithful Explanations in
NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals.
It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation.
It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.