A survey on improving NLP models with human explanations
- URL: http://arxiv.org/abs/2204.08892v1
- Date: Tue, 19 Apr 2022 13:43:31 GMT
- Title: A survey on improving NLP models with human explanations
- Authors: Mareike Hartmann and Daniel Sonntag
- Abstract summary: Training a model with access to human explanations can improve data efficiency and model performance on in- and out-of-domain data.
Similarity with the process of human learning makes learning from explanations a promising way to establish a fruitful human-machine interaction.
- Score: 10.14196008734383
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training a model with access to human explanations can improve data
efficiency and model performance on in- and out-of-domain data. Adding to these
empirical findings, similarity with the process of human learning makes
learning from explanations a promising way to establish a fruitful
human-machine interaction. Several methods have been proposed for improving
natural language processing (NLP) models with human explanations, that rely on
different explanation types and mechanism for integrating these explanations
into the learning process. These methods are rarely compared with each other,
making it hard for practitioners to choose the best combination of explanation
type and integration mechanism for a specific use-case. In this paper, we give
an overview of different methods for learning from human explanations, and
discuss different factors that can inform the decision of which method to
choose for a specific use-case.
Related papers
- Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers.
We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models.
Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z) - Exploring the Trade-off Between Model Performance and Explanation Plausibility of Text Classifiers Using Human Rationales [3.242050660144211]
Saliency post-hoc explainability methods are important tools for understanding increasingly complex NLP models.
We present a methodology for incorporating rationales, which are text annotations explaining human decisions, into text classification models.
arXiv Detail & Related papers (2024-04-03T22:39:33Z) - Explainability for Machine Learning Models: From Data Adaptability to
User Perception [0.8702432681310401]
This thesis explores the generation of local explanations for already deployed machine learning models.
It aims to identify optimal conditions for producing meaningful explanations considering both data and user requirements.
arXiv Detail & Related papers (2024-02-16T18:44:37Z) - Evaluating the Utility of Model Explanations for Model Development [54.23538543168767]
We evaluate whether explanations can improve human decision-making in practical scenarios of machine learning model development.
To our surprise, we did not find evidence of significant improvement on tasks when users were provided with any of the saliency maps.
These findings suggest caution regarding the usefulness and potential for misunderstanding in saliency-based explanations.
arXiv Detail & Related papers (2023-12-10T23:13:23Z) - Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing.
This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z) - MaNtLE: Model-agnostic Natural Language Explainer [9.43206883360088]
We introduce MaNtLE, a model-agnostic natural language explainer that analyzes multiple classifier predictions.
MaNtLE uses multi-task training on thousands of synthetic classification tasks to generate faithful explanations.
Simulated user studies indicate that, on average, MaNtLE-generated explanations are at least 11% more faithful compared to LIME and Anchors explanations.
arXiv Detail & Related papers (2023-05-22T12:58:06Z) - Testing the effectiveness of saliency-based explainability in NLP using
randomized survey-based experiments [0.6091702876917281]
A lot of work in Explainable AI has aimed to devise explanation methods that give humans insights into the workings and predictions of NLP models.
Innate human tendencies and biases can handicap the understanding of these explanations in humans.
We designed a randomized survey-based experiment to understand the effectiveness of saliency-based Post-hoc explainability methods in Natural Language Processing.
arXiv Detail & Related papers (2022-11-25T08:49:01Z) - Interpreting Language Models with Contrastive Explanations [99.7035899290924]
Language models must consider various features to predict a token, such as its part of speech, number, tense, or semantics.
Existing explanation methods conflate evidence for all these features into a single explanation, which is less interpretable for human understanding.
We show that contrastive explanations are quantifiably better than non-contrastive explanations in verifying major grammatical phenomena.
arXiv Detail & Related papers (2022-02-21T18:32:24Z) - Discriminative Attribution from Counterfactuals [64.94009515033984]
We present a method for neural network interpretability by combining feature attribution with counterfactual explanations.
We show that this method can be used to quantitatively evaluate the performance of feature attribution methods in an objective manner.
arXiv Detail & Related papers (2021-09-28T00:53:34Z) - A Diagnostic Study of Explainability Techniques for Text Classification [52.879658637466605]
We develop a list of diagnostic properties for evaluating existing explainability techniques.
We compare the saliency scores assigned by the explainability techniques with human annotations of salient input regions to find relations between a model's performance and the agreement of its rationales with human ones.
arXiv Detail & Related papers (2020-09-25T12:01:53Z) - The Explanation Game: Towards Prediction Explainability through Sparse
Communication [6.497816402045099]
We provide a unified perspective of explainability as a problem between an explainer and a layperson.
We use this framework to compare several prior approaches for extracting explanations.
We propose new embedded methods for explainability, through the use of selective, sparse attention.
arXiv Detail & Related papers (2020-04-28T22:27:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.