Related papers: Rethinking Benign Overfitting in Two-Layer Neural Networks

Rethinking Benign Overfitting in Two-Layer Neural Networks

URL: http://arxiv.org/abs/2502.11893v1
Date: Mon, 17 Feb 2025 15:20:04 GMT
Title: Rethinking Benign Overfitting in Two-Layer Neural Networks
Authors: Ruichen Xu, Kexin Chen,
Abstract summary: We refine the feature-noise data model by incorporating class-dependent heterogeneous noise and re-examine the overfitting phenomenon in neural networks.<n>Our findings reveal that neural networks can leverage "data noise", previously deemed harmful, to learn implicit features that improve the classification accuracy for long-tailed data.
Score: 2.486161976966064
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent theoretical studies (Kou et al., 2023; Cao et al., 2022) have revealed a sharp phase transition from benign to harmful overfitting when the noise-to-feature ratio exceeds a threshold-a situation common in long-tailed data distributions where atypical data is prevalent. However, harmful overfitting rarely happens in overparameterized neural networks. Further experimental results suggested that memorization is necessary for achieving near-optimal generalization error in long-tailed data distributions (Feldman & Zhang, 2020). We argue that this discrepancy between theoretical predictions and empirical observations arises because previous feature-noise data models overlook the heterogeneous nature of noise across different data classes. In this paper, we refine the feature-noise data model by incorporating class-dependent heterogeneous noise and re-examine the overfitting phenomenon in neural networks. Through a comprehensive analysis of the training dynamics, we establish test loss bounds for the refined model. Our findings reveal that neural networks can leverage "data noise", previously deemed harmful, to learn implicit features that improve the classification accuracy for long-tailed data. Experimental validation on both synthetic and real-world datasets supports our theoretical results.

Related papers

Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
This paper tackles the emerging challenge of training generative models within a self-consuming loop. We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models. We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
arXiv Detail & Related papers (2024-02-19T02:08:09Z)
Harnessing Synthetic Datasets: The Role of Shape Bias in Deep Neural Network Generalization [27.39922946288783]
We investigate how neural networks exhibit shape bias during training on synthetic datasets. Shape bias varies across network architectures and types of supervision. We propose a novel interpretation of shape bias as a tool for estimating the diversity of samples within a dataset.
arXiv Detail & Related papers (2023-11-10T18:25:44Z)
Understanding Pathologies of Deep Heteroskedastic Regression [25.509884677111344]
Heteroskedastic models predict both mean and residual noise for each data point. At one extreme, these models fit all training data perfectly, eliminating residual noise entirely. At the other, they overfit the residual noise while predicting a constant, uninformative mean. We observe a lack of middle ground, suggesting a phase transition dependent on model regularization strength.
arXiv Detail & Related papers (2023-06-29T06:31:27Z)
Deep Double Descent via Smooth Interpolation [2.141079906482723]
We quantify sharpness of fit of training data by studying the loss landscape w.r.t. to the input variable locally to each training point. Our findings show that loss sharpness in the input space follows both model- and epoch-wise double descent, with worse peaks observed around noisy targets. While small interpolating models sharply fit both clean and noisy data, large interpolating models express a smooth loss landscape, in contrast to existing intuition.
arXiv Detail & Related papers (2022-09-21T02:46:13Z)
Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data [44.431266188350655]
We consider the generalization error of two-layer neural networks trained to generalize by gradient descent. We show that neural networks exhibit benign overfitting: they can be driven to zero training error, perfectly fitting any noisy training labels, and simultaneously achieve minimax optimal test error. In contrast to previous work on benign overfitting that require linear or kernel-based predictors, our analysis holds in a setting where both the model and learning dynamics are fundamentally nonlinear.
arXiv Detail & Related papers (2022-02-11T23:04:00Z)
Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
We study the phenomenon of "double descent" of the generalization error. We find that double descent can be attributed to distinct features being learned at different scales.
arXiv Detail & Related papers (2021-12-06T18:17:08Z)
The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer Linear Networks [51.1848572349154]
neural network models that perfectly fit noisy data can generalize well to unseen test data. We consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk.
arXiv Detail & Related papers (2021-08-25T22:01:01Z)
Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss. We examine how these benign overfitting phenomena occur in a two-layer neural network setting. We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z)
Improving Uncertainty Calibration via Prior Augmented Data [56.88185136509654]
Neural networks have proven successful at learning from complex data distributions by acting as universal function approximators. They are often overconfident in their predictions, which leads to inaccurate and miscalibrated probabilistic predictions. We propose a solution by seeking out regions of feature space where the model is unjustifiably overconfident, and conditionally raising the entropy of those predictions towards that of the prior distribution of the labels.
arXiv Detail & Related papers (2021-02-22T07:02:37Z)
Binary Classification of Gaussian Mixtures: Abundance of Support Vectors, Benign Overfitting and Regularization [39.35822033674126]
We study binary linear classification under a generative Gaussian mixture model. We derive novel non-asymptotic bounds on the classification error of the latter. Our results extend to a noisy model with constant probability noise flips.
arXiv Detail & Related papers (2020-11-18T07:59:55Z)
Learning from Failure: Training Debiased Classifier from Biased Classifier [76.52804102765931]
We show that neural networks learn to rely on spurious correlation only when it is "easier" to learn than the desired knowledge. We propose a failure-based debiasing scheme by training a pair of neural networks simultaneously. Our method significantly improves the training of the network against various types of biases in both synthetic and real-world datasets.
arXiv Detail & Related papers (2020-07-06T07:20:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.