Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial
Explanations of Their Behavior in Natural Language?
- URL: http://arxiv.org/abs/2010.04119v1
- Date: Thu, 8 Oct 2020 16:59:07 GMT
- Title: Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial
Explanations of Their Behavior in Natural Language?
- Authors: Peter Hase, Shiyue Zhang, Harry Xie, Mohit Bansal
- Abstract summary: We introduce a leakage-adjusted simulatability (LAS) metric for evaluating NL explanations.
LAS measures how well explanations help an observer predict a model's output, while controlling for how explanations can directly leak the output.
We frame explanation generation as a multi-agent game and optimize explanations for simulatability while penalizing label leakage.
- Score: 86.60613602337246
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data collection for natural language (NL) understanding tasks has
increasingly included human explanations alongside data points, allowing past
works to introduce models that both perform a task and generate NL explanations
for their outputs. Yet to date, model-generated explanations have been
evaluated on the basis of surface-level similarities to human explanations,
both through automatic metrics like BLEU and human evaluations. We argue that
these evaluations are insufficient, since they fail to indicate whether
explanations support actual model behavior (faithfulness), rather than simply
match what a human would say (plausibility). In this work, we address the
problem of evaluating explanations from the model simulatability perspective.
Our contributions are as follows: (1) We introduce a leakage-adjusted
simulatability (LAS) metric for evaluating NL explanations, which measures how
well explanations help an observer predict a model's output, while controlling
for how explanations can directly leak the output. We use a model as a proxy
for a human observer, and validate this choice with two human subject
experiments. (2) Using the CoS-E and e-SNLI datasets, we evaluate two existing
generative graphical models and two new approaches; one rationalizing method we
introduce achieves roughly human-level LAS scores. (3) Lastly, we frame
explanation generation as a multi-agent game and optimize explanations for
simulatability while penalizing label leakage, which can improve LAS scores. We
provide code for the experiments in this paper at
https://github.com/peterbhase/LAS-NL-Explanations
Related papers
- XForecast: Evaluating Natural Language Explanations for Time Series Forecasting [72.57427992446698]
Time series forecasting aids decision-making, especially for stakeholders who rely on accurate predictions.
Traditional explainable AI (XAI) methods, which underline feature or temporal importance, often require expert knowledge.
evaluating forecast NLEs is difficult due to the complex causal relationships in time series data.
arXiv Detail & Related papers (2024-10-18T05:16:39Z) - Explainability for Machine Learning Models: From Data Adaptability to
User Perception [0.8702432681310401]
This thesis explores the generation of local explanations for already deployed machine learning models.
It aims to identify optimal conditions for producing meaningful explanations considering both data and user requirements.
arXiv Detail & Related papers (2024-02-16T18:44:37Z) - Towards More Faithful Natural Language Explanation Using Multi-Level
Contrastive Learning in VQA [7.141288053123662]
Natural language explanation in visual question answer (VQA-NLE) aims to explain the decision-making process of models by generating natural language sentences to increase users' trust in the black-box systems.
Existing post-hoc explanations are not always aligned with human logical inference, suffering from the issues on: 1) Deductive unsatisfiability, the generated explanations do not logically lead to the answer; 2) Factual inconsistency, the model falsifies its counterfactual explanation for answers without considering the facts in images; and 3) Semantic perturbation insensitivity, the model can not recognize the semantic changes caused by small perturbations
arXiv Detail & Related papers (2023-12-21T05:51:55Z) - Evaluating the Utility of Model Explanations for Model Development [54.23538543168767]
We evaluate whether explanations can improve human decision-making in practical scenarios of machine learning model development.
To our surprise, we did not find evidence of significant improvement on tasks when users were provided with any of the saliency maps.
These findings suggest caution regarding the usefulness and potential for misunderstanding in saliency-based explanations.
arXiv Detail & Related papers (2023-12-10T23:13:23Z) - Explaining Explainability: Towards Deeper Actionable Insights into Deep
Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level.
We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z) - Are Human Explanations Always Helpful? Towards Objective Evaluation of
Human Natural Language Explanations [27.624182544486334]
We build on the view that the quality of a human-annotated explanation can be measured based on its helpfulness.
We define a new metric that can take into consideration the helpfulness of an explanation for model performance.
arXiv Detail & Related papers (2023-05-04T19:31:50Z) - To what extent do human explanations of model behavior align with actual
model behavior? [91.67905128825402]
We investigated the extent to which human-generated explanations of models' inference decisions align with how models actually make these decisions.
We defined two alignment metrics that quantify how well natural language human explanations align with model sensitivity to input words.
We find that a model's alignment with human explanations is not predicted by the model's accuracy on NLI.
arXiv Detail & Related papers (2020-12-24T17:40:06Z) - Evaluating Explainable AI: Which Algorithmic Explanations Help Users
Predict Model Behavior? [97.77183117452235]
We carry out human subject tests to isolate the effect of algorithmic explanations on model interpretability.
Clear evidence of method effectiveness is found in very few cases.
Our results provide the first reliable and comprehensive estimates of how explanations influence simulatability.
arXiv Detail & Related papers (2020-05-04T20:35:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.