Related papers: "Will You Find These Shortcuts?" A Protocol for Evaluating the Faithfulness of Input Salience Methods for Text Classification

"Will You Find These Shortcuts?" A Protocol for Evaluating the Faithfulness of Input Salience Methods for Text Classification

URL: http://arxiv.org/abs/2111.07367v1
Date: Sun, 14 Nov 2021 15:31:29 GMT
Title: "Will You Find These Shortcuts?" A Protocol for Evaluating the Faithfulness of Input Salience Methods for Text Classification
Authors: Jasmijn Bastings, Sebastian Ebert, Polina Zablotskaia, Anders Sandholm, Katja Filippova
Abstract summary: We present a protocol for faithfulness evaluation that makes use of partially synthetic data to obtain ground truth for feature importance ranking. We do an in-depth analysis of four standard salience method classes on a range of datasets and shortcuts for BERT and LSTM models. We recommend following the protocol for each new task and model combination to find the best method for identifying shortcuts.
Score: 38.22453895596424
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Feature attribution a.k.a. input salience methods which assign an importance score to a feature are abundant but may produce surprisingly different results for the same model on the same input. While differences are expected if disparate definitions of importance are assumed, most methods claim to provide faithful attributions and point at the features most relevant for a model's prediction. Existing work on faithfulness evaluation is not conclusive and does not provide a clear answer as to how different methods are to be compared. Focusing on text classification and the model debugging scenario, our main contribution is a protocol for faithfulness evaluation that makes use of partially synthetic data to obtain ground truth for feature importance ranking. Following the protocol, we do an in-depth analysis of four standard salience method classes on a range of datasets and shortcuts for BERT and LSTM models and demonstrate that some of the most popular method configurations provide poor results even for simplest shortcuts. We recommend following the protocol for each new task and model combination to find the best method for identifying shortcuts.

Related papers

Label-template based Few-Shot Text Classification with Contrastive Learning [7.964862748983985]
We propose a simple and effective few-shot text classification framework. Label templates are embedded into input sentences to fully utilize the potential value of class labels. supervised contrastive learning is utilized to model the interaction information between support samples and query samples.
arXiv Detail & Related papers (2024-12-13T12:51:50Z)
Easy to Decide, Hard to Agree: Reducing Disagreements Between Saliency Methods [0.15039745292757667]
We show that saliency methods exhibit weak rank correlations even when applied to the same model instance. Regularization techniques that increase faithfulness of attention explanations also increase agreement between saliency methods.
arXiv Detail & Related papers (2022-11-15T18:18:34Z)
An Additive Instance-Wise Approach to Multi-class Model Interpretation [53.87578024052922]
Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system. Existing methods mainly focus on selecting explanatory input features, which follow either locally additive or instance-wise approaches. This work exploits the strengths of both methods and proposes a global framework for learning local explanations simultaneously for multiple target classes.
arXiv Detail & Related papers (2022-07-07T06:50:27Z)
Label-Descriptive Patterns and their Application to Characterizing Classification Errors [31.272875287136426]
State-of-the-art deep learning methods achieve human-like performance on many tasks, but make errors nevertheless. Characterizing these errors in easily interpretable terms gives insight into whether a model is prone to making systematic errors, but also gives a way to act and improve the model. In this paper we propose a method that allows us to do so for arbitrary classifiers by mining a small set of patterns that together succinctly describe the input data that is partitioned according to correctness of prediction.
arXiv Detail & Related papers (2021-10-18T19:42:21Z)
Finding Significant Features for Few-Shot Learning using Dimensionality Reduction [0.0]
This module helps to improve the accuracy performance by allowing the similarity function, given by the metric learning method, to have more discriminative features for the classification. Our method outperforms the metric learning baselines in the miniImageNet dataset by around 2% in accuracy performance.
arXiv Detail & Related papers (2021-07-06T16:36:57Z)
Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts. We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data. We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z)
MatchVIE: Exploiting Match Relevancy between Entities for Visual Information Extraction [48.55908127994688]
We propose a novel key-value matching model based on a graph neural network for VIE (MatchVIE) Through key-value matching based on relevancy evaluation, the proposed MatchVIE can bypass the recognitions to various semantics. We introduce a simple but effective operation, Num2Vec, to tackle the instability of encoded values.
arXiv Detail & Related papers (2021-06-24T12:06:29Z)
Variable Instance-Level Explainability for Text Classification [9.147707153504117]
We propose a method for extracting variable-length explanations using a set of different feature scoring methods at instance-level. Our method consistently provides more faithful explanations compared to previous fixed-length and fixed-feature scoring methods for rationale extraction.
arXiv Detail & Related papers (2021-04-16T16:53:48Z)
An Empirical Comparison of Instance Attribution Methods for NLP [62.63504976810927]
We evaluate the degree to which different potential instance attribution agree with respect to the importance of training samples. We find that simple retrieval methods yield training instances that differ from those identified via gradient-based methods.
arXiv Detail & Related papers (2021-04-09T01:03:17Z)
Prototypical Contrastive Learning of Unsupervised Representations [171.3046900127166]
Prototypical Contrastive Learning (PCL) is an unsupervised representation learning method. PCL implicitly encodes semantic structures of the data into the learned embedding space. PCL outperforms state-of-the-art instance-wise contrastive learning methods on multiple benchmarks.
arXiv Detail & Related papers (2020-05-11T09:53:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.