Related papers: Understanding the Failure Modes of Out-of-Distribution Generalization

Understanding the Failure Modes of Out-of-Distribution Generalization

URL: http://arxiv.org/abs/2010.15775v3
Date: Fri, 6 Sep 2024 22:45:47 GMT
Title: Understanding the Failure Modes of Out-of-Distribution Generalization
Authors: Vaishnavh Nagarajan, Anders Andreassen, Behnam Neyshabur,
Abstract summary: Empirical studies suggest that machine learning models often rely on features, such as the background, that may be spuriously correlated with the label only during training time. In this work, we identify the fundamental factors that give rise to this behavior, by explaining why models fail this way em even in easy-to-learn tasks.
Score: 35.00563456450452
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Empirical studies suggest that machine learning models often rely on features, such as the background, that may be spuriously correlated with the label only during training time, resulting in poor accuracy during test-time. In this work, we identify the fundamental factors that give rise to this behavior, by explaining why models fail this way {\em even} in easy-to-learn tasks where one would expect these models to succeed. In particular, through a theoretical study of gradient-descent-trained linear classifiers on some easy-to-learn tasks, we uncover two complementary failure modes. These modes arise from how spurious correlations induce two kinds of skews in the data: one geometric in nature, and another, statistical in nature. Finally, we construct natural modifications of image classification datasets to understand when these failure modes can arise in practice. We also design experiments to isolate the two failure modes when training modern neural networks on these datasets.

Related papers

Ascent Fails to Forget [45.75497227694833]
We show that gradient ascent-based unconstrained optimization methods frequently fail to perform machine unlearning.<n>We attribute this phenomenon to the inherent statistical dependence between the forget and retain data sets.<n>Our findings highlight that the presence of such statistical dependencies, even when manifest only as correlations, can be sufficient for ascent-based unlearning to fail.
arXiv Detail & Related papers (2025-09-30T15:48:49Z)
Invariance Pair-Guided Learning: Enhancing Robustness in Neural Networks [0.0]
We propose a technique to guide the neural network through the training phase. We form a corrective gradient complementing the traditional gradient descent approach. Experiments on ColoredMNIST, Waterbird-100, and CelebANIST datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2025-02-26T09:36:00Z)
Repairing Neural Networks by Leaving the Right Past Behind [23.78437548836594]
Prediction failures of machine learning models often arise from deficiencies in training data. This work develops a generic framework for both identifying training examples that have given rise to the target failure, and fixing the model through erasing information about them.
arXiv Detail & Related papers (2022-07-11T12:07:39Z)
Debugging using Orthogonal Gradient Descent [7.766921168069532]
Given a trained model that is partially faulty, can we correct its behaviour without having to train the model from scratch? In other words, can we " neural networks similar to how we address bugs in our mathematical models and standard computer code?
arXiv Detail & Related papers (2022-06-17T00:03:54Z)
Certifying Data-Bias Robustness in Linear Regression [12.00314910031517]
We present a technique for certifying whether linear regression models are pointwise-robust to label bias in a training dataset. We show how to solve this problem exactly for individual test points, and provide an approximate but more scalable method. We also unearth gaps in bias-robustness, such as high levels of non-robustness for certain bias assumptions on some datasets.
arXiv Detail & Related papers (2022-06-07T20:47:07Z)
On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification. We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned. Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z)
Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
We study the phenomenon of "double descent" of the generalization error. We find that double descent can be attributed to distinct features being learned at different scales.
arXiv Detail & Related papers (2021-12-06T18:17:08Z)
Capturing the learning curves of generic features maps for realistic data sets with a teacher-student model [24.679669970832396]
Teacher-student models provide a powerful framework in which the typical case performance of high-dimensional supervised learning tasks can be studied in closed form. In this setting, labels are assigned to data - often taken to be Gaussian i.i.d. - by a teacher model, and the goal is to characterise the typical performance of the student model in recovering the parameters that generated the labels.
arXiv Detail & Related papers (2021-02-16T12:49:15Z)
Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task. This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z)
On the Transferability of Adversarial Attacksagainst Neural Text Classifier [121.6758865857686]
We investigate the transferability of adversarial examples for text classification models. We propose a genetic algorithm to find an ensemble of models that can induce adversarial examples to fool almost all existing models. We derive word replacement rules that can be used for model diagnostics from these adversarial examples.
arXiv Detail & Related papers (2020-11-17T10:45:05Z)
Uses and Abuses of the Cross-Entropy Loss: Case Studies in Modern Deep Learning [29.473503894240096]
We focus on the use of the categorical cross-entropy loss to model data that is not strictly categorical, but rather takes values on the simplex. This practice is standard in neural network architectures with label smoothing and actor-mimic reinforcement learning, amongst others. We propose probabilistically-inspired alternatives to these models, providing an approach that is more principled and theoretically appealing.
arXiv Detail & Related papers (2020-11-10T16:44:35Z)
Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle. In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize. Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z)
Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks. We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task. Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.