An Empirical Study on Explanations in Out-of-Domain Settings
- URL: http://arxiv.org/abs/2203.00056v1
- Date: Mon, 28 Feb 2022 19:50:23 GMT
- Title: An Empirical Study on Explanations in Out-of-Domain Settings
- Authors: George Chrysostomou and Nikolaos Aletras
- Abstract summary: We study how post-hoc explanations and inherently faithful models perform in out-of-domain settings.
Results show that in many cases out-of-domain post-hoc explanation faithfulness measured by sufficiency and comprehensiveness is higher compared to in-domain.
Our findings also show that select-then predict models demonstrate comparable predictive performance in out-of-domain settings to full-text trained models.
- Score: 35.07805573291534
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work in Natural Language Processing has focused on developing
approaches that extract faithful explanations, either via identifying the most
important tokens in the input (i.e. post-hoc explanations) or by designing
inherently faithful models that first select the most important tokens and then
use them to predict the correct label (i.e. select-then-predict models).
Currently, these approaches are largely evaluated on in-domain settings. Yet,
little is known about how post-hoc explanations and inherently faithful models
perform in out-of-domain settings. In this paper, we conduct an extensive
empirical study that examines: (1) the out-of-domain faithfulness of post-hoc
explanations, generated by five feature attribution methods; and (2) the
out-of-domain performance of two inherently faithful models over six datasets.
Contrary to our expectations, results show that in many cases out-of-domain
post-hoc explanation faithfulness measured by sufficiency and comprehensiveness
is higher compared to in-domain. We find this misleading and suggest using a
random baseline as a yardstick for evaluating post-hoc explanation
faithfulness. Our findings also show that select-then predict models
demonstrate comparable predictive performance in out-of-domain settings to
full-text trained models.
Related papers
- Data-centric Prediction Explanation via Kernelized Stein Discrepancy [14.177012256360635]
This paper presents a Highly-precise and Data-centric Explanation (HD-Explain) prediction explanation method that exploits properties of Kernelized Stein Discrepancy (KSD)
Specifically, the KSD uniquely defines a parameterized kernel function for a trained model that encodes model-dependent data correlation.
We show that HD-Explain outperforms existing methods from various aspects, including preciseness (fine-grained explanation), consistency, and 3) computation efficiency.
arXiv Detail & Related papers (2024-03-22T19:04:02Z) - Explaining Pre-Trained Language Models with Attribution Scores: An
Analysis in Low-Resource Settings [32.03184402316848]
We analyze attribution scores extracted from prompt-based models w.r.t. plausibility and faithfulness.
We find that using the prompting paradigm yields more plausible explanations than fine-tuning the models in low-resource settings.
arXiv Detail & Related papers (2024-03-08T14:14:37Z) - Towards Faithful Explanations for Text Classification with Robustness
Improvement and Explanation Guided Training [30.626080706755822]
Feature attribution methods highlight the important input tokens as explanations to model predictions.
Recent works show that explanations provided by these methods face challenges of being faithful and robust.
We propose a method with Robustness improvement and Explanation Guided training towards more faithful EXplanations (REGEX) for text classification.
arXiv Detail & Related papers (2023-12-29T13:07:07Z) - Quantifying Representation Reliability in Self-Supervised Learning Models [12.485580780944083]
Self-supervised learning models extract general-purpose representations from data.
We introduce a formal definition of representation reliability.
We propose an ensemble-based method for estimating the representation reliability without knowing the downstream tasks a priori.
arXiv Detail & Related papers (2023-05-31T21:57:33Z) - Assessing Out-of-Domain Language Model Performance from Few Examples [38.245449474937914]
We address the task of predicting out-of-domain (OOD) performance in a few-shot fashion.
We benchmark the performance on this task when looking at model accuracy on the few-shot examples.
We show that attribution-based factors can help rank relative model OOD performance.
arXiv Detail & Related papers (2022-10-13T04:45:26Z) - An Additive Instance-Wise Approach to Multi-class Model Interpretation [53.87578024052922]
Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system.
Existing methods mainly focus on selecting explanatory input features, which follow either locally additive or instance-wise approaches.
This work exploits the strengths of both methods and proposes a global framework for learning local explanations simultaneously for multiple target classes.
arXiv Detail & Related papers (2022-07-07T06:50:27Z) - VisFIS: Visual Feature Importance Supervision with
Right-for-the-Right-Reason Objectives [84.48039784446166]
We show that model FI supervision can meaningfully improve VQA model accuracy as well as performance on several Right-for-the-Right-Reason metrics.
Our best performing method, Visual Feature Importance Supervision (VisFIS), outperforms strong baselines on benchmark VQA datasets.
Predictions are more accurate when explanations are plausible and faithful, and not when they are plausible but not faithful.
arXiv Detail & Related papers (2022-06-22T17:02:01Z) - Search Methods for Sufficient, Socially-Aligned Feature Importance
Explanations with In-Distribution Counterfactuals [72.00815192668193]
Feature importance (FI) estimates are a popular form of explanation, and they are commonly created and evaluated by computing the change in model confidence caused by removing certain input features at test time.
We study several under-explored dimensions of FI-based explanations, providing conceptual and empirical improvements for this form of explanation.
arXiv Detail & Related papers (2021-06-01T20:36:48Z) - Generative Counterfactuals for Neural Networks via Attribute-Informed
Perturbation [51.29486247405601]
We design a framework to generate counterfactuals for raw data instances with the proposed Attribute-Informed Perturbation (AIP)
By utilizing generative models conditioned with different attributes, counterfactuals with desired labels can be obtained effectively and efficiently.
Experimental results on real-world texts and images demonstrate the effectiveness, sample quality as well as efficiency of our designed framework.
arXiv Detail & Related papers (2021-01-18T08:37:13Z) - Toward Scalable and Unified Example-based Explanation and Outlier
Detection [128.23117182137418]
We argue for a broader adoption of prototype-based student networks capable of providing an example-based explanation for their prediction.
We show that our prototype-based networks beyond similarity kernels deliver meaningful explanations and promising outlier detection results without compromising classification accuracy.
arXiv Detail & Related papers (2020-11-11T05:58:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.