Related papers: A survey on improving NLP models with human explanations

A survey on improving NLP models with human explanations

URL: http://arxiv.org/abs/2204.08892v1
Date: Tue, 19 Apr 2022 13:43:31 GMT
Title: A survey on improving NLP models with human explanations
Authors: Mareike Hartmann and Daniel Sonntag
Abstract summary: Training a model with access to human explanations can improve data efficiency and model performance on in- and out-of-domain data. Similarity with the process of human learning makes learning from explanations a promising way to establish a fruitful human-machine interaction.
Score: 10.14196008734383
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Training a model with access to human explanations can improve data efficiency and model performance on in- and out-of-domain data. Adding to these empirical findings, similarity with the process of human learning makes learning from explanations a promising way to establish a fruitful human-machine interaction. Several methods have been proposed for improving natural language processing (NLP) models with human explanations, that rely on different explanation types and mechanism for integrating these explanations into the learning process. These methods are rarely compared with each other, making it hard for practitioners to choose the best combination of explanation type and integration mechanism for a specific use-case. In this paper, we give an overview of different methods for learning from human explanations, and discuss different factors that can inform the decision of which method to choose for a specific use-case.

Related papers

Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers. We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models. Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z)
Exploring the Trade-off Between Model Performance and Explanation Plausibility of Text Classifiers Using Human Rationales [3.242050660144211]
Saliency post-hoc explainability methods are important tools for understanding increasingly complex NLP models. We present a methodology for incorporating rationales, which are text annotations explaining human decisions, into text classification models.
arXiv Detail & Related papers (2024-04-03T22:39:33Z)
Explainability for Machine Learning Models: From Data Adaptability to User Perception [0.8702432681310401]
This thesis explores the generation of local explanations for already deployed machine learning models. It aims to identify optimal conditions for producing meaningful explanations considering both data and user requirements.
arXiv Detail & Related papers (2024-02-16T18:44:37Z)
Evaluating the Utility of Model Explanations for Model Development [54.23538543168767]
We evaluate whether explanations can improve human decision-making in practical scenarios of machine learning model development. To our surprise, we did not find evidence of significant improvement on tasks when users were provided with any of the saliency maps. These findings suggest caution regarding the usefulness and potential for misunderstanding in saliency-based explanations.
arXiv Detail & Related papers (2023-12-10T23:13:23Z)
Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing. This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z)
MaNtLE: Model-agnostic Natural Language Explainer [9.43206883360088]
We introduce MaNtLE, a model-agnostic natural language explainer that analyzes multiple classifier predictions. MaNtLE uses multi-task training on thousands of synthetic classification tasks to generate faithful explanations. Simulated user studies indicate that, on average, MaNtLE-generated explanations are at least 11% more faithful compared to LIME and Anchors explanations.
arXiv Detail & Related papers (2023-05-22T12:58:06Z)
Testing the effectiveness of saliency-based explainability in NLP using randomized survey-based experiments [0.6091702876917281]
A lot of work in Explainable AI has aimed to devise explanation methods that give humans insights into the workings and predictions of NLP models. Innate human tendencies and biases can handicap the understanding of these explanations in humans. We designed a randomized survey-based experiment to understand the effectiveness of saliency-based Post-hoc explainability methods in Natural Language Processing.
arXiv Detail & Related papers (2022-11-25T08:49:01Z)
Interpreting Language Models with Contrastive Explanations [99.7035899290924]
Language models must consider various features to predict a token, such as its part of speech, number, tense, or semantics. Existing explanation methods conflate evidence for all these features into a single explanation, which is less interpretable for human understanding. We show that contrastive explanations are quantifiably better than non-contrastive explanations in verifying major grammatical phenomena.
arXiv Detail & Related papers (2022-02-21T18:32:24Z)
Discriminative Attribution from Counterfactuals [64.94009515033984]
We present a method for neural network interpretability by combining feature attribution with counterfactual explanations. We show that this method can be used to quantitatively evaluate the performance of feature attribution methods in an objective manner.
arXiv Detail & Related papers (2021-09-28T00:53:34Z)
A Diagnostic Study of Explainability Techniques for Text Classification [52.879658637466605]
We develop a list of diagnostic properties for evaluating existing explainability techniques. We compare the saliency scores assigned by the explainability techniques with human annotations of salient input regions to find relations between a model's performance and the agreement of its rationales with human ones.
arXiv Detail & Related papers (2020-09-25T12:01:53Z)
The Explanation Game: Towards Prediction Explainability through Sparse Communication [6.497816402045099]
We provide a unified perspective of explainability as a problem between an explainer and a layperson. We use this framework to compare several prior approaches for extracting explanations. We propose new embedded methods for explainability, through the use of selective, sparse attention.
arXiv Detail & Related papers (2020-04-28T22:27:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.