Robust Infidelity: When Faithfulness Measures on Masked Language Models Are Misleading
- URL: http://arxiv.org/abs/2308.06795v2
- Date: Fri, 31 May 2024 22:41:54 GMT
- Title: Robust Infidelity: When Faithfulness Measures on Masked Language Models Are Misleading
- Authors: Evan Crothers, Herna Viktor, Nathalie Japkowicz,
- Abstract summary: We show that iterative masking produces large variation in faithfulness scores between otherwise comparable Transformer encoder text classifiers.
We explore task-specific considerations that undermine principled comparison of interpretability using iterative masking.
- Score: 5.124348720450654
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A common approach to quantifying neural text classifier interpretability is to calculate faithfulness metrics based on iteratively masking salient input tokens and measuring changes in the model prediction. We propose that this property is better described as "sensitivity to iterative masking", and highlight pitfalls in using this measure for comparing text classifier interpretability. We show that iterative masking produces large variation in faithfulness scores between otherwise comparable Transformer encoder text classifiers. We then demonstrate that iteratively masked samples produce embeddings outside the distribution seen during training, resulting in unpredictable behaviour. We further explore task-specific considerations that undermine principled comparison of interpretability using iterative masking, such as an underlying similarity to salience-based adversarial attacks. Our findings give insight into how these behaviours affect neural text classifiers, and provide guidance on how sensitivity to iterative masking should be interpreted.
Related papers
- Learning Robust Classifiers with Self-Guided Spurious Correlation Mitigation [26.544938760265136]
Deep neural classifiers rely on spurious correlations between spurious attributes of inputs and targets to make predictions.
We propose a self-guided spurious correlation mitigation framework.
We show that training the classifier to distinguish different prediction behaviors reduces its reliance on spurious correlations without knowing them a priori.
arXiv Detail & Related papers (2024-05-06T17:12:21Z) - Regressor-Segmenter Mutual Prompt Learning for Crowd Counting [70.49246560246736]
We propose mutual prompt learning (mPrompt) to solve bias and inaccuracy caused by annotation variance.
Experiments show that mPrompt significantly reduces the Mean Average Error (MAE)
arXiv Detail & Related papers (2023-12-04T07:53:59Z) - How adversarial attacks can disrupt seemingly stable accurate classifiers [76.95145661711514]
Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data.
Here, we show that this may be seen as a fundamental feature of classifiers working with high dimensional input data.
We introduce a simple generic and generalisable framework for which key behaviours observed in practical systems arise with high probability.
arXiv Detail & Related papers (2023-09-07T12:02:00Z) - Variational Classification [51.2541371924591]
We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders.
Treating inputs to the softmax layer as samples of a latent variable, our abstracted perspective reveals a potential inconsistency.
We induce a chosen latent distribution, instead of the implicit assumption found in a standard softmax layer.
arXiv Detail & Related papers (2023-05-17T17:47:19Z) - Explaining Image Classifiers Using Contrastive Counterfactuals in
Generative Latent Spaces [12.514483749037998]
We introduce a novel method to generate causal and yet interpretable counterfactual explanations for image classifiers.
We use this framework to obtain contrastive and causal sufficiency and necessity scores as global explanations for black-box classifiers.
arXiv Detail & Related papers (2022-06-10T17:54:46Z) - On the rate of convergence of a classifier based on a Transformer
encoder [55.41148606254641]
The rate of convergence of the misclassification probability of the classifier towards the optimal misclassification probability is analyzed.
It is shown that this classifier is able to circumvent the curse of dimensionality provided the aposteriori probability satisfies a suitable hierarchical composition model.
arXiv Detail & Related papers (2021-11-29T14:58:29Z) - Disentangling Representations of Text by Masking Transformers [27.6903196190087]
We learn binary masks over transformer weights or hidden units to uncover subsets of features that correlate with a specific factor of variation.
We evaluate this method with respect to its ability to disentangle representations of sentiment from genre in movie reviews, "toxicity" from dialect in Tweets, and syntax from semantics.
arXiv Detail & Related papers (2021-04-14T22:45:34Z) - Disentangled Contrastive Learning for Learning Robust Textual
Representations [13.880693856907037]
We introduce the concept of momentum representation consistency to align features and leverage power normalization while conforming the uniformity.
Our experimental results for the NLP benchmarks demonstrate that our approach can obtain better results compared with the baselines.
arXiv Detail & Related papers (2021-04-11T03:32:49Z) - Autoencoding Variational Autoencoder [56.05008520271406]
We study the implications of this behaviour on the learned representations and also the consequences of fixing it by introducing a notion of self consistency.
We show that encoders trained with our self-consistency approach lead to representations that are robust (insensitive) to perturbations in the input introduced by adversarial attacks.
arXiv Detail & Related papers (2020-12-07T14:16:14Z) - Learning Variational Word Masks to Improve the Interpretability of
Neural Text Classifiers [21.594361495948316]
A new line of work on improving model interpretability has just started, and many existing methods require either prior information or human annotations as additional inputs in training.
We propose the variational word mask (VMASK) method to automatically learn task-specific important words and reduce irrelevant information on classification, which ultimately improves the interpretability of model predictions.
arXiv Detail & Related papers (2020-10-01T20:02:43Z) - Learning Disentangled Representations with Latent Variation
Predictability [102.4163768995288]
This paper defines the variation predictability of latent disentangled representations.
Within an adversarial generation process, we encourage variation predictability by maximizing the mutual information between latent variations and corresponding image pairs.
We develop an evaluation metric that does not rely on the ground-truth generative factors to measure the disentanglement of latent representations.
arXiv Detail & Related papers (2020-07-25T08:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.