"Will You Find These Shortcuts?" A Protocol for Evaluating the
Faithfulness of Input Salience Methods for Text Classification
- URL: http://arxiv.org/abs/2111.07367v1
- Date: Sun, 14 Nov 2021 15:31:29 GMT
- Title: "Will You Find These Shortcuts?" A Protocol for Evaluating the
Faithfulness of Input Salience Methods for Text Classification
- Authors: Jasmijn Bastings, Sebastian Ebert, Polina Zablotskaia, Anders
Sandholm, Katja Filippova
- Abstract summary: We present a protocol for faithfulness evaluation that makes use of partially synthetic data to obtain ground truth for feature importance ranking.
We do an in-depth analysis of four standard salience method classes on a range of datasets and shortcuts for BERT and LSTM models.
We recommend following the protocol for each new task and model combination to find the best method for identifying shortcuts.
- Score: 38.22453895596424
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Feature attribution a.k.a. input salience methods which assign an importance
score to a feature are abundant but may produce surprisingly different results
for the same model on the same input. While differences are expected if
disparate definitions of importance are assumed, most methods claim to provide
faithful attributions and point at the features most relevant for a model's
prediction. Existing work on faithfulness evaluation is not conclusive and does
not provide a clear answer as to how different methods are to be compared.
Focusing on text classification and the model debugging scenario, our main
contribution is a protocol for faithfulness evaluation that makes use of
partially synthetic data to obtain ground truth for feature importance ranking.
Following the protocol, we do an in-depth analysis of four standard salience
method classes on a range of datasets and shortcuts for BERT and LSTM models
and demonstrate that some of the most popular method configurations provide
poor results even for simplest shortcuts. We recommend following the protocol
for each new task and model combination to find the best method for identifying
shortcuts.
Related papers
- Easy to Decide, Hard to Agree: Reducing Disagreements Between Saliency
Methods [0.15039745292757667]
We show that saliency methods exhibit weak rank correlations even when applied to the same model instance.
Regularization techniques that increase faithfulness of attention explanations also increase agreement between saliency methods.
arXiv Detail & Related papers (2022-11-15T18:18:34Z) - An Additive Instance-Wise Approach to Multi-class Model Interpretation [53.87578024052922]
Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system.
Existing methods mainly focus on selecting explanatory input features, which follow either locally additive or instance-wise approaches.
This work exploits the strengths of both methods and proposes a global framework for learning local explanations simultaneously for multiple target classes.
arXiv Detail & Related papers (2022-07-07T06:50:27Z) - Label-Descriptive Patterns and their Application to Characterizing
Classification Errors [31.272875287136426]
State-of-the-art deep learning methods achieve human-like performance on many tasks, but make errors nevertheless.
Characterizing these errors in easily interpretable terms gives insight into whether a model is prone to making systematic errors, but also gives a way to act and improve the model.
In this paper we propose a method that allows us to do so for arbitrary classifiers by mining a small set of patterns that together succinctly describe the input data that is partitioned according to correctness of prediction.
arXiv Detail & Related papers (2021-10-18T19:42:21Z) - Finding Significant Features for Few-Shot Learning using Dimensionality
Reduction [0.0]
This module helps to improve the accuracy performance by allowing the similarity function, given by the metric learning method, to have more discriminative features for the classification.
Our method outperforms the metric learning baselines in the miniImageNet dataset by around 2% in accuracy performance.
arXiv Detail & Related papers (2021-07-06T16:36:57Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z) - MatchVIE: Exploiting Match Relevancy between Entities for Visual
Information Extraction [48.55908127994688]
We propose a novel key-value matching model based on a graph neural network for VIE (MatchVIE)
Through key-value matching based on relevancy evaluation, the proposed MatchVIE can bypass the recognitions to various semantics.
We introduce a simple but effective operation, Num2Vec, to tackle the instability of encoded values.
arXiv Detail & Related papers (2021-06-24T12:06:29Z) - Variable Instance-Level Explainability for Text Classification [9.147707153504117]
We propose a method for extracting variable-length explanations using a set of different feature scoring methods at instance-level.
Our method consistently provides more faithful explanations compared to previous fixed-length and fixed-feature scoring methods for rationale extraction.
arXiv Detail & Related papers (2021-04-16T16:53:48Z) - An Empirical Comparison of Instance Attribution Methods for NLP [62.63504976810927]
We evaluate the degree to which different potential instance attribution agree with respect to the importance of training samples.
We find that simple retrieval methods yield training instances that differ from those identified via gradient-based methods.
arXiv Detail & Related papers (2021-04-09T01:03:17Z) - Prototypical Contrastive Learning of Unsupervised Representations [171.3046900127166]
Prototypical Contrastive Learning (PCL) is an unsupervised representation learning method.
PCL implicitly encodes semantic structures of the data into the learned embedding space.
PCL outperforms state-of-the-art instance-wise contrastive learning methods on multiple benchmarks.
arXiv Detail & Related papers (2020-05-11T09:53:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.