A Song of (Dis)agreement: Evaluating the Evaluation of Explainable
Artificial Intelligence in Natural Language Processing
- URL: http://arxiv.org/abs/2205.04559v1
- Date: Mon, 9 May 2022 21:07:39 GMT
- Title: A Song of (Dis)agreement: Evaluating the Evaluation of Explainable
Artificial Intelligence in Natural Language Processing
- Authors: Michael Neely, Stefan F. Schouten, Maurits Bleeker, Ana Lucic
- Abstract summary: We argue that the community should stop using rank correlation as an evaluation metric for attention-based explanations.
We find that attention-based explanations do not correlate strongly with any recent feature attribution methods.
- Score: 7.527234046228323
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There has been significant debate in the NLP community about whether or not
attention weights can be used as an explanation - a mechanism for interpreting
how important each input token is for a particular prediction. The validity of
"attention as explanation" has so far been evaluated by computing the rank
correlation between attention-based explanations and existing feature
attribution explanations using LSTM-based models. In our work, we (i) compare
the rank correlation between five more recent feature attribution methods and
two attention-based methods, on two types of NLP tasks, and (ii) extend this
analysis to also include transformer-based models. We find that attention-based
explanations do not correlate strongly with any recent feature attribution
methods, regardless of the model or task. Furthermore, we find that none of the
tested explanations correlate strongly with one another for the
transformer-based model, leading us to question the underlying assumption that
we should measure the validity of attention-based explanations based on how
well they correlate with existing feature attribution explanation methods.
After conducting experiments on five datasets using two different models, we
argue that the community should stop using rank correlation as an evaluation
metric for attention-based explanations. We suggest that researchers and
practitioners should instead test various explanation methods and employ a
human-in-the-loop process to determine if the explanations align with human
intuition for the particular use case at hand.
Related papers
- Exploring the Plausibility of Hate and Counter Speech Detectors with Explainable AI [0.7317046947172644]
We compare four different explainability approaches, i.e. gradient-based, perturbation-based, attention-based, and prototype-based approaches.
Results show that perturbation-based explainability performs best, followed by gradient-based and attention-based explainability.
arXiv Detail & Related papers (2024-07-25T10:17:04Z) - Explainability for Machine Learning Models: From Data Adaptability to
User Perception [0.8702432681310401]
This thesis explores the generation of local explanations for already deployed machine learning models.
It aims to identify optimal conditions for producing meaningful explanations considering both data and user requirements.
arXiv Detail & Related papers (2024-02-16T18:44:37Z) - Evaluating the Utility of Model Explanations for Model Development [54.23538543168767]
We evaluate whether explanations can improve human decision-making in practical scenarios of machine learning model development.
To our surprise, we did not find evidence of significant improvement on tasks when users were provided with any of the saliency maps.
These findings suggest caution regarding the usefulness and potential for misunderstanding in saliency-based explanations.
arXiv Detail & Related papers (2023-12-10T23:13:23Z) - Explaining Explainability: Towards Deeper Actionable Insights into Deep
Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level.
We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z) - Counterfactuals of Counterfactuals: a back-translation-inspired approach
to analyse counterfactual editors [3.4253416336476246]
We focus on the analysis of counterfactual, contrastive explanations.
We propose a new back translation-inspired evaluation methodology.
We show that by iteratively feeding the counterfactual to the explainer we can obtain valuable insights into the behaviour of both the predictor and the explainer models.
arXiv Detail & Related papers (2023-05-26T16:04:28Z) - Human Interpretation of Saliency-based Explanation Over Text [65.29015910991261]
We study saliency-based explanations over textual data.
We find that people often mis-interpret the explanations.
We propose a method to adjust saliencies based on model estimates of over- and under-perception.
arXiv Detail & Related papers (2022-01-27T15:20:32Z) - Order in the Court: Explainable AI Methods Prone to Disagreement [0.0]
In Natural Language Processing, feature-additive explanation methods quantify the independent contribution of each input token towards a model's decision.
Previous analyses have sought to either invalidate or support the role of attention-based explanations as a faithful and plausible measure of salience.
We show that rank correlation is largely uninformative and does not measure the quality of feature-additive methods.
arXiv Detail & Related papers (2021-05-07T14:27:37Z) - Towards Unifying Feature Attribution and Counterfactual Explanations:
Different Means to the Same End [17.226134854746267]
We present a method to generate feature attribution explanations from a set of counterfactual examples.
We show how counterfactual examples can be used to evaluate the goodness of an attribution-based explanation in terms of its necessity and sufficiency.
arXiv Detail & Related papers (2020-11-10T05:41:43Z) - Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial
Explanations of Their Behavior in Natural Language? [86.60613602337246]
We introduce a leakage-adjusted simulatability (LAS) metric for evaluating NL explanations.
LAS measures how well explanations help an observer predict a model's output, while controlling for how explanations can directly leak the output.
We frame explanation generation as a multi-agent game and optimize explanations for simulatability while penalizing label leakage.
arXiv Detail & Related papers (2020-10-08T16:59:07Z) - The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal
Sufficient Subsets [61.66584140190247]
We show that feature-based explanations pose problems even for explaining trivial models.
We show that two popular classes of explainers, Shapley explainers and minimal sufficient subsets explainers, target fundamentally different types of ground-truth explanations.
arXiv Detail & Related papers (2020-09-23T09:45:23Z) - Evaluating Explainable AI: Which Algorithmic Explanations Help Users
Predict Model Behavior? [97.77183117452235]
We carry out human subject tests to isolate the effect of algorithmic explanations on model interpretability.
Clear evidence of method effectiveness is found in very few cases.
Our results provide the first reliable and comprehensive estimates of how explanations influence simulatability.
arXiv Detail & Related papers (2020-05-04T20:35:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.