Related papers: A Song of (Dis)agreement: Evaluating the Evaluation of Explainable Artificial Intelligence in Natural Language Processing

A Song of (Dis)agreement: Evaluating the Evaluation of Explainable Artificial Intelligence in Natural Language Processing

URL: http://arxiv.org/abs/2205.04559v1
Date: Mon, 9 May 2022 21:07:39 GMT
Title: A Song of (Dis)agreement: Evaluating the Evaluation of Explainable Artificial Intelligence in Natural Language Processing
Authors: Michael Neely, Stefan F. Schouten, Maurits Bleeker, Ana Lucic
Abstract summary: We argue that the community should stop using rank correlation as an evaluation metric for attention-based explanations. We find that attention-based explanations do not correlate strongly with any recent feature attribution methods.
Score: 7.527234046228323
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: There has been significant debate in the NLP community about whether or not attention weights can be used as an explanation - a mechanism for interpreting how important each input token is for a particular prediction. The validity of "attention as explanation" has so far been evaluated by computing the rank correlation between attention-based explanations and existing feature attribution explanations using LSTM-based models. In our work, we (i) compare the rank correlation between five more recent feature attribution methods and two attention-based methods, on two types of NLP tasks, and (ii) extend this analysis to also include transformer-based models. We find that attention-based explanations do not correlate strongly with any recent feature attribution methods, regardless of the model or task. Furthermore, we find that none of the tested explanations correlate strongly with one another for the transformer-based model, leading us to question the underlying assumption that we should measure the validity of attention-based explanations based on how well they correlate with existing feature attribution explanation methods. After conducting experiments on five datasets using two different models, we argue that the community should stop using rank correlation as an evaluation metric for attention-based explanations. We suggest that researchers and practitioners should instead test various explanation methods and employ a human-in-the-loop process to determine if the explanations align with human intuition for the particular use case at hand.

Related papers

Discrete Subgraph Sampling for Interpretable Graph based Visual Question Answering [27.193336817953142]
We integrate different discrete subset sampling methods into a graph-based visual question answering system. We show that the integrated methods effectively mitigate the performance trade-off between interpretability and answer accuracy. We also conduct a human evaluation to assess the interpretability of the generated subgraphs.
arXiv Detail & Related papers (2024-12-11T10:18:37Z)
Exploring the Plausibility of Hate and Counter Speech Detectors with Explainable AI [0.7317046947172644]
We compare four different explainability approaches, i.e. gradient-based, perturbation-based, attention-based, and prototype-based approaches. Results show that perturbation-based explainability performs best, followed by gradient-based and attention-based explainability.
arXiv Detail & Related papers (2024-07-25T10:17:04Z)
Explainability for Machine Learning Models: From Data Adaptability to User Perception [0.8702432681310401]
This thesis explores the generation of local explanations for already deployed machine learning models. It aims to identify optimal conditions for producing meaningful explanations considering both data and user requirements.
arXiv Detail & Related papers (2024-02-16T18:44:37Z)
Evaluating the Utility of Model Explanations for Model Development [54.23538543168767]
We evaluate whether explanations can improve human decision-making in practical scenarios of machine learning model development. To our surprise, we did not find evidence of significant improvement on tasks when users were provided with any of the saliency maps. These findings suggest caution regarding the usefulness and potential for misunderstanding in saliency-based explanations.
arXiv Detail & Related papers (2023-12-10T23:13:23Z)
Explaining Explainability: Towards Deeper Actionable Insights into Deep Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level. We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z)
Counterfactuals of Counterfactuals: a back-translation-inspired approach to analyse counterfactual editors [3.4253416336476246]
We focus on the analysis of counterfactual, contrastive explanations. We propose a new back translation-inspired evaluation methodology. We show that by iteratively feeding the counterfactual to the explainer we can obtain valuable insights into the behaviour of both the predictor and the explainer models.
arXiv Detail & Related papers (2023-05-26T16:04:28Z)
Human Interpretation of Saliency-based Explanation Over Text [65.29015910991261]
We study saliency-based explanations over textual data. We find that people often mis-interpret the explanations. We propose a method to adjust saliencies based on model estimates of over- and under-perception.
arXiv Detail & Related papers (2022-01-27T15:20:32Z)
Order in the Court: Explainable AI Methods Prone to Disagreement [0.0]
In Natural Language Processing, feature-additive explanation methods quantify the independent contribution of each input token towards a model's decision. Previous analyses have sought to either invalidate or support the role of attention-based explanations as a faithful and plausible measure of salience. We show that rank correlation is largely uninformative and does not measure the quality of feature-additive methods.
arXiv Detail & Related papers (2021-05-07T14:27:37Z)
Towards Unifying Feature Attribution and Counterfactual Explanations: Different Means to the Same End [17.226134854746267]
We present a method to generate feature attribution explanations from a set of counterfactual examples. We show how counterfactual examples can be used to evaluate the goodness of an attribution-based explanation in terms of its necessity and sufficiency.
arXiv Detail & Related papers (2020-11-10T05:41:43Z)
Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial Explanations of Their Behavior in Natural Language? [86.60613602337246]
We introduce a leakage-adjusted simulatability (LAS) metric for evaluating NL explanations. LAS measures how well explanations help an observer predict a model's output, while controlling for how explanations can directly leak the output. We frame explanation generation as a multi-agent game and optimize explanations for simulatability while penalizing label leakage.
arXiv Detail & Related papers (2020-10-08T16:59:07Z)
The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal Sufficient Subsets [61.66584140190247]
We show that feature-based explanations pose problems even for explaining trivial models. We show that two popular classes of explainers, Shapley explainers and minimal sufficient subsets explainers, target fundamentally different types of ground-truth explanations.
arXiv Detail & Related papers (2020-09-23T09:45:23Z)
Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior? [97.77183117452235]
We carry out human subject tests to isolate the effect of algorithmic explanations on model interpretability. Clear evidence of method effectiveness is found in very few cases. Our results provide the first reliable and comprehensive estimates of how explanations influence simulatability.
arXiv Detail & Related papers (2020-05-04T20:35:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.