Towards Interpreting and Mitigating Shortcut Learning Behavior of NLU
models
- URL: http://arxiv.org/abs/2103.06922v1
- Date: Thu, 11 Mar 2021 19:39:56 GMT
- Title: Towards Interpreting and Mitigating Shortcut Learning Behavior of NLU
models
- Authors: Mengnan Du, Varun Manjunatha, Rajiv Jain, Ruchi Deshpande, Franck
Dernoncourt, Jiuxiang Gu, Tong Sun and Xia Hu
- Abstract summary: We show that trained NLU models have strong preference for features located at the head of the long-tailed distribution.
We propose a shortcut mitigation framework, to suppress the model from making overconfident predictions for samples with large shortcut degree.
- Score: 53.36605766266518
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent studies indicate that NLU models are prone to rely on shortcut
features for prediction. As a result, these models could potentially fail to
generalize to real-world out-of-distribution scenarios. In this work, we show
that the shortcut learning behavior can be explained by the long-tailed
phenomenon. There are two findings : 1) Trained NLU models have strong
preference for features located at the head of the long-tailed distribution,
and 2) Shortcut features are picked up during very early few iterations of the
model training. These two observations are further employed to formulate a
measurement which can quantify the shortcut degree of each training sample.
Based on this shortcut measurement, we propose a shortcut mitigation framework,
to suppress the model from making overconfident predictions for samples with
large shortcut degree. Experimental results on three NLU benchmarks demonstrate
that our long-tailed distribution explanation accurately reflects the shortcut
learning behavior of NLU models. Experimental analysis further indicates that
our method can improve the generalization accuracy on OOD data, while
preserving the accuracy on in distribution test data.
Related papers
- Sparse Prototype Network for Explainable Pedestrian Behavior Prediction [60.80524827122901]
We present Sparse Prototype Network (SPN), an explainable method designed to simultaneously predict a pedestrian's future action, trajectory, and pose.
Regularized by mono-semanticity and clustering constraints, the prototypes learn consistent and human-understandable features.
arXiv Detail & Related papers (2024-10-16T03:33:40Z) - Not Eliminate but Aggregate: Post-Hoc Control over Mixture-of-Experts to Address Shortcut Shifts in Natural Language Understanding [5.4480125359160265]
We propose a pessimistically aggregating the predictions of a mixture-of-experts, assuming each expert captures relatively different latent features.
The experimental results demonstrate that our post-hoc control over the experts significantly enhances the model's robustness to the distribution shift in shortcuts.
arXiv Detail & Related papers (2024-06-17T20:00:04Z) - On the Foundations of Shortcut Learning [20.53986437152018]
We study how predictivity and availability interact to shape models' feature use.
We find that linear models are relatively unbiased, but introducing a single hidden layer with ReLU or Tanh units yields a bias.
arXiv Detail & Related papers (2023-10-24T22:54:05Z) - Evaluating and Explaining Large Language Models for Code Using Syntactic
Structures [74.93762031957883]
This paper introduces ASTxplainer, an explainability method specific to Large Language Models for code.
At its core, ASTxplainer provides an automated method for aligning token predictions with AST nodes.
We perform an empirical evaluation on 12 popular LLMs for code using a curated dataset of the most popular GitHub projects.
arXiv Detail & Related papers (2023-08-07T18:50:57Z) - How to Construct Perfect and Worse-than-Coin-Flip Spoofing
Countermeasures: A Word of Warning on Shortcut Learning [20.486639064376014]
Shortcut learning, or Clever Hans effect refers to situations where a learning agent learns spurious correlations present in data, resulting in biased models.
We focus on finding shortcuts in deep learning based spoofing countermeasures (CMs) that predict whether a given utterance is spoofed or not.
arXiv Detail & Related papers (2023-05-31T15:58:37Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Why Machine Reading Comprehension Models Learn Shortcuts? [56.629192589376046]
We argue that larger proportion of shortcut questions in training data make models rely on shortcut tricks excessively.
A thorough empirical analysis shows that MRC models tend to learn shortcut questions earlier than challenging questions.
arXiv Detail & Related papers (2021-06-02T08:43:12Z) - Provable Benefits of Overparameterization in Model Compression: From
Double Descent to Pruning Neural Networks [38.153825455980645]
Recent empirical evidence indicates that the practice of overization not only benefits training large models, but also assists - perhaps counterintuitively - building lightweight models.
This paper sheds light on these empirical findings by theoretically characterizing the high-dimensional toolsets of model pruning.
We analytically identify regimes in which, even if the location of the most informative features is known, we are better off fitting a large model and then pruning.
arXiv Detail & Related papers (2020-12-16T05:13:30Z) - A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood.
We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks.
Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.