Related papers: Benign Overfitting in Classification: Provably Counter Label Noise with Larger Models

Benign Overfitting in Classification: Provably Counter Label Noise with Larger Models

URL: http://arxiv.org/abs/2206.00501v2
Date: Mon, 3 Apr 2023 13:32:08 GMT
Title: Benign Overfitting in Classification: Provably Counter Label Noise with Larger Models
Authors: Kaiyue Wen, Jiaye Teng, Jingzhao Zhang
Abstract summary: We show that a ResNet model overfits benignly on Cifar10 but not benignly on ImageNet. Our work highlights the importance of understanding implicit bias in underfitting regimes as a future direction.
Score: 8.696962915720174
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Studies on benign overfitting provide insights for the success of overparameterized deep learning models. In this work, we examine whether overfitting is truly benign in real-world classification tasks. We start with the observation that a ResNet model overfits benignly on Cifar10 but not benignly on ImageNet. To understand why benign overfitting fails in the ImageNet experiment, we theoretically analyze benign overfitting under a more restrictive setup where the number of parameters is not significantly larger than the number of data points. Under this mild overparameterization setup, our analysis identifies a phase change: unlike in the previous heavy overparameterization settings, benign overfitting can now fail in the presence of label noise. Our analysis explains our empirical observations, and is validated by a set of control experiments with ResNets. Our work highlights the importance of understanding implicit bias in underfitting regimes as a future direction.

Related papers

Are vision language models robust to uncertain inputs? [5.249651874118556]
We show that newer and larger vision language models exhibit improved robustness compared to earlier models, but still suffer from a tendency to strictly follow instructions.<n>For natural images such as ImageNet, this limitation can be overcome without pipeline modifications.<n>We propose a novel mechanism based on caption diversity to reveal a model's internal uncertainty.
arXiv Detail & Related papers (2025-05-17T03:16:49Z)
Rethinking Oversmoothing in Graph Neural Networks: A Rank-Based Perspective [5.482832675034467]
We show that rank-based metrics consistently capture oversmoothing, whereas energy-based metrics often fail. Notably, we reveal that a significant drop in the rank aligns closely with performance degradation, even in scenarios where energy metrics remain unchanged.
arXiv Detail & Related papers (2025-02-07T00:55:05Z)
SINDER: Repairing the Singular Defects of DINOv2 [61.98878352956125]
Vision Transformer models trained on large-scale datasets often exhibit artifacts in the patch token they extract. We propose a novel fine-tuning smooth regularization that rectifies structural deficiencies using only a small dataset.
arXiv Detail & Related papers (2024-07-23T20:34:23Z)
Quantifying lottery tickets under label noise: accuracy, calibration, and complexity [6.232071870655069]
Pruning deep neural networks is a widely used strategy to alleviate the computational burden in machine learning. We use the sparse double descent approach to identify univocally and characterise pruned models associated with classification tasks.
arXiv Detail & Related papers (2023-06-21T11:35:59Z)
Understanding Collapse in Non-Contrastive Learning [122.2499276246997]
We show that SimSiam representations undergo partial dimensional collapse if the model is too small relative to the dataset size. We propose a metric to measure the degree of this collapse and show that it can be used to forecast the downstream task performance without any fine-tuning or labels.
arXiv Detail & Related papers (2022-09-29T17:59:55Z)
Robust Fine-Tuning of Deep Neural Networks with Hessian-based Generalization Guarantees [20.2407347618552]
We study the generalization properties of fine-tuning to understand the problem of overfitting. We present an algorithm and a generalization error guarantee for this algorithm under a class conditional independent noise model.
arXiv Detail & Related papers (2022-06-06T14:52:46Z)
On the Role of Optimization in Double Descent: A Least Squares Study [30.44215064390409]
We show an excess risk bound for the descent gradient solution of the least squares objective. We find that in case of noiseless regression, double descent is explained solely by optimization-related quantities. We empirically explore if our predictions hold for neural networks.
arXiv Detail & Related papers (2021-07-27T09:13:11Z)
On the (Un-)Avoidability of Adversarial Examples [4.822598110892847]
adversarial examples in deep learning models have caused substantial concern over their reliability. We provide a framework for determining whether a model's label change under small perturbation is justified. We prove that our adaptive data-augmentation maintains consistency of 1-nearest neighbor classification under deterministic labels.
arXiv Detail & Related papers (2021-06-24T21:35:25Z)
Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption. They can suffer from ill-posedness and convergence instability. This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z)
Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification [58.03725169462616]
We show theoretically that over-parametrization is not the only reason for over-confidence. We prove that logistic regression is inherently over-confident, in the realizable, under-parametrized setting. Perhaps surprisingly, we also show that over-confidence is not always the case.
arXiv Detail & Related papers (2021-02-15T21:38:09Z)
Second-Moment Loss: A Novel Regression Objective for Improved Uncertainties [7.766663822644739]
Quantification of uncertainty is one of the most promising approaches to establish safe machine learning. One of the most commonly used approaches so far is Monte Carlo dropout, which is computationally cheap and easy to apply in practice. We propose a new objective, referred to as second-moment loss ( UCI), to address this issue.
arXiv Detail & Related papers (2020-12-23T14:17:33Z)
A Sober Look at the Unsupervised Learning of Disentangled Representations and their Evaluation [63.042651834453544]
We show that the unsupervised learning of disentangled representations is impossible without inductive biases on both the models and the data. We observe that while the different methods successfully enforce properties "encouraged" by the corresponding losses, well-disentangled models seemingly cannot be identified without supervision. Our results suggest that future work on disentanglement learning should be explicit about the role of inductive biases and (implicit) supervision.
arXiv Detail & Related papers (2020-10-27T10:17:15Z)
Contextual Linear Bandits under Noisy Features: Towards Bayesian Oracles [65.9694455739978]
We study contextual linear bandit problems under feature uncertainty, where the features are noisy and have missing entries. Our analysis reveals that the optimal hypothesis can significantly deviate from the underlying realizability function, depending on the noise characteristics. This implies that classical approaches cannot guarantee a non-trivial regret bound.
arXiv Detail & Related papers (2017-03-03T21:39:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.