Post-hoc Interpretability for Neural NLP: A Survey
- URL: http://arxiv.org/abs/2108.04840v5
- Date: Tue, 28 Nov 2023 06:39:41 GMT
- Title: Post-hoc Interpretability for Neural NLP: A Survey
- Authors: Andreas Madsen, Siva Reddy, Sarath Chandar
- Abstract summary: Interpretability serves to provide explanations in terms that are understandable to humans.
This survey provides a categorization of how recent post-hoc interpretability methods communicate explanations to humans.
- Score: 38.67924043709067
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural networks for NLP are becoming increasingly complex and widespread, and
there is a growing concern if these models are responsible to use. Explaining
models helps to address the safety and ethical concerns and is essential for
accountability. Interpretability serves to provide these explanations in terms
that are understandable to humans. Additionally, post-hoc methods provide
explanations after a model is learned and are generally model-agnostic. This
survey provides a categorization of how recent post-hoc interpretability
methods communicate explanations to humans, it discusses each method in-depth,
and how they are validated, as the latter is often a common concern.
Related papers
- Evaluating the Utility of Model Explanations for Model Development [54.23538543168767]
We evaluate whether explanations can improve human decision-making in practical scenarios of machine learning model development.
To our surprise, we did not find evidence of significant improvement on tasks when users were provided with any of the saliency maps.
These findings suggest caution regarding the usefulness and potential for misunderstanding in saliency-based explanations.
arXiv Detail & Related papers (2023-12-10T23:13:23Z) - Explaining Explainability: Towards Deeper Actionable Insights into Deep
Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level.
We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z) - Testing the effectiveness of saliency-based explainability in NLP using
randomized survey-based experiments [0.6091702876917281]
A lot of work in Explainable AI has aimed to devise explanation methods that give humans insights into the workings and predictions of NLP models.
Innate human tendencies and biases can handicap the understanding of these explanations in humans.
We designed a randomized survey-based experiment to understand the effectiveness of saliency-based Post-hoc explainability methods in Natural Language Processing.
arXiv Detail & Related papers (2022-11-25T08:49:01Z) - On the Robustness of Explanations of Deep Neural Network Models: A
Survey [14.940679892694089]
We present a comprehensive survey of methods that study, understand, attack, and defend explanations of Deep Neural Network (DNN) models.
We also present a detailed review of different metrics used to evaluate explanation methods, as well as describe attributional attack and defense methods.
arXiv Detail & Related papers (2022-11-09T10:14:21Z) - Towards Faithful Model Explanation in NLP: A Survey [48.690624266879155]
End-to-end neural Natural Language Processing (NLP) models are notoriously difficult to understand.
One desideratum of model explanation is faithfulness, i.e. an explanation should accurately represent the reasoning process behind the model's prediction.
We review over 110 model explanation methods in NLP through the lens of faithfulness.
arXiv Detail & Related papers (2022-09-22T21:40:51Z) - NELLIE: A Neuro-Symbolic Inference Engine for Grounded, Compositional, and Explainable Reasoning [59.16962123636579]
This paper proposes a new take on Prolog-based inference engines.
We replace handcrafted rules with a combination of neural language modeling, guided generation, and semi dense retrieval.
Our implementation, NELLIE, is the first system to demonstrate fully interpretable, end-to-end grounded QA.
arXiv Detail & Related papers (2022-09-16T00:54:44Z) - ExSum: From Local Explanations to Model Understanding [6.23934576145261]
Interpretability methods are developed to understand the working mechanisms of black-box models.
Fulfilling this goal requires both that the explanations generated by these methods are correct and that people can easily and reliably understand them.
We introduce explanation summary (ExSum), a mathematical framework for quantifying model understanding.
arXiv Detail & Related papers (2022-04-30T02:07:20Z) - Generalizable Neuro-symbolic Systems for Commonsense Question Answering [67.72218865519493]
This chapter illustrates how suitable neuro-symbolic models for language understanding can enable domain generalizability and robustness in downstream tasks.
Different methods for integrating neural language models and knowledge graphs are discussed.
arXiv Detail & Related papers (2022-01-17T06:13:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.