Towards Faithful Model Explanation in NLP: A Survey
- URL: http://arxiv.org/abs/2209.11326v4
- Date: Fri, 12 Jan 2024 20:19:20 GMT
- Title: Towards Faithful Model Explanation in NLP: A Survey
- Authors: Qing Lyu, Marianna Apidianaki, Chris Callison-Burch
- Abstract summary: End-to-end neural Natural Language Processing (NLP) models are notoriously difficult to understand.
One desideratum of model explanation is faithfulness, i.e. an explanation should accurately represent the reasoning process behind the model's prediction.
We review over 110 model explanation methods in NLP through the lens of faithfulness.
- Score: 48.690624266879155
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: End-to-end neural Natural Language Processing (NLP) models are notoriously
difficult to understand. This has given rise to numerous efforts towards model
explainability in recent years. One desideratum of model explanation is
faithfulness, i.e. an explanation should accurately represent the reasoning
process behind the model's prediction. In this survey, we review over 110 model
explanation methods in NLP through the lens of faithfulness. We first discuss
the definition and evaluation of faithfulness, as well as its significance for
explainability. We then introduce recent advances in faithful explanation,
grouping existing approaches into five categories: similarity-based methods,
analysis of model-internal structures, backpropagation-based methods,
counterfactual intervention, and self-explanatory models. For each category, we
synthesize its representative studies, strengths, and weaknesses. Finally, we
summarize their common virtues and remaining challenges, and reflect on future
work directions towards faithful explainability in NLP.
Related papers
- Exploring the Trade-off Between Model Performance and Explanation Plausibility of Text Classifiers Using Human Rationales [3.242050660144211]
Saliency post-hoc explainability methods are important tools for understanding increasingly complex NLP models.
We present a methodology for incorporating rationales, which are text annotations explaining human decisions, into text classification models.
arXiv Detail & Related papers (2024-04-03T22:39:33Z) - An Axiomatic Approach to Model-Agnostic Concept Explanations [67.84000759813435]
We propose an approach to concept explanations that satisfy three natural axioms: linearity, recursivity, and similarity.
We then establish connections with previous concept explanation methods, offering insight into their varying semantic meanings.
arXiv Detail & Related papers (2024-01-12T20:53:35Z) - Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing.
This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z) - Faithfulness Tests for Natural Language Explanations [87.01093277918599]
Explanations of neural models aim to reveal a model's decision-making process for its predictions.
Recent work shows that current methods giving explanations such as saliency maps or counterfactuals can be misleading.
This work explores the challenging question of evaluating the faithfulness of natural language explanations.
arXiv Detail & Related papers (2023-05-29T11:40:37Z) - Counterfactuals of Counterfactuals: a back-translation-inspired approach
to analyse counterfactual editors [3.4253416336476246]
We focus on the analysis of counterfactual, contrastive explanations.
We propose a new back translation-inspired evaluation methodology.
We show that by iteratively feeding the counterfactual to the explainer we can obtain valuable insights into the behaviour of both the predictor and the explainer models.
arXiv Detail & Related papers (2023-05-26T16:04:28Z) - Logical Satisfiability of Counterfactuals for Faithful Explanations in
NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals.
It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation.
It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z) - Post-hoc Interpretability for Neural NLP: A Survey [38.67924043709067]
Interpretability serves to provide explanations in terms that are understandable to humans.
This survey provides a categorization of how recent post-hoc interpretability methods communicate explanations to humans.
arXiv Detail & Related papers (2021-08-10T18:00:14Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - A Survey of the State of Explainable AI for Natural Language Processing [16.660110121500125]
This survey presents an overview of the current state of Explainable AI (XAI)
We discuss the main categorization of explanations, as well as the various ways explanations can be arrived at and visualized.
We detail the operations and explainability techniques currently available for generating explanations for NLP model predictions, to serve as a resource for model developers in the community.
arXiv Detail & Related papers (2020-10-01T22:33:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.