Related papers: Benign Overfitting in Linear Classifiers and Leaky ReLU Networks from KKT Conditions for Margin Maximization

Benign Overfitting in Linear Classifiers and Leaky ReLU Networks from KKT Conditions for Margin Maximization

URL: http://arxiv.org/abs/2303.01462v1
Date: Thu, 2 Mar 2023 18:24:26 GMT
Title: Benign Overfitting in Linear Classifiers and Leaky ReLU Networks from KKT Conditions for Margin Maximization
Authors: Spencer Frei and Gal Vardi and Peter L. Bartlett and Nathan Srebro
Abstract summary: Linears and leaky ReLU trained by gradient flow on logistic loss have an implicit bias towards satisfying the Karush-KuTucker (KKT) conditions. In this work we establish a number of settings where the satisfaction of these conditions implies benign overfitting in linear classifiers and in two-layer leaky ReLU networks.
Score: 59.038366742773164
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Linear classifiers and leaky ReLU networks trained by gradient flow on the logistic loss have an implicit bias towards solutions which satisfy the Karush--Kuhn--Tucker (KKT) conditions for margin maximization. In this work we establish a number of settings where the satisfaction of these KKT conditions implies benign overfitting in linear classifiers and in two-layer leaky ReLU networks: the estimators interpolate noisy training data and simultaneously generalize well to test data. The settings include variants of the noisy class-conditional Gaussians considered in previous work as well as new distributional settings where benign overfitting has not been previously observed. The key ingredient to our proof is the observation that when the training data is nearly-orthogonal, both linear classifiers and leaky ReLU networks satisfying the KKT conditions for their respective margin maximization problems behave like a nearly uniform average of the training examples.

Related papers

Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice. We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z)
Gradient descent provably escapes saddle points in the training of shallow ReLU networks [6.458742319938318]
We prove a variant of the relevant dynamical systems result, a center-stable manifold theorem, in which we relax some of the regularity requirements. Building on a detailed examination of critical points of the square integral loss function for shallow ReLU and leaky ReLU networks, we show that gradient descents most saddle points.
arXiv Detail & Related papers (2022-08-03T14:08:52Z)
Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data [44.431266188350655]
We consider the generalization error of two-layer neural networks trained to generalize by gradient descent. We show that neural networks exhibit benign overfitting: they can be driven to zero training error, perfectly fitting any noisy training labels, and simultaneously achieve minimax optimal test error. In contrast to previous work on benign overfitting that require linear or kernel-based predictors, our analysis holds in a setting where both the model and learning dynamics are fundamentally nonlinear.
arXiv Detail & Related papers (2022-02-11T23:04:00Z)
Self-Certifying Classification by Linearized Deep Assignment [65.0100925582087]
We propose a novel class of deep predictors for classifying metric data on graphs within PAC-Bayes risk certification paradigm. Building on the recent PAC-Bayes literature and data-dependent priors, this approach enables learning posterior distributions on the hypothesis space.
arXiv Detail & Related papers (2022-01-26T19:59:14Z)
Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption. They can suffer from ill-posedness and convergence instability. This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z)
Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures [100.55816326422773]
We study the phenomenon of the maximum margin classifier for linear classification problems. Our results precisely characterize the condition under which benign overfitting can occur.
arXiv Detail & Related papers (2021-04-28T08:25:16Z)
Binary Classification of Gaussian Mixtures: Abundance of Support Vectors, Benign Overfitting and Regularization [39.35822033674126]
We study binary linear classification under a generative Gaussian mixture model. We derive novel non-asymptotic bounds on the classification error of the latter. Our results extend to a noisy model with constant probability noise flips.
arXiv Detail & Related papers (2020-11-18T07:59:55Z)
DAGs with No Fears: A Closer Look at Continuous Optimization for Learning Bayesian Networks [45.3591788771536]
We re-examine a continuous optimization framework dubbed NOTEARS for learning Bayesian networks. We show that the Karush-Kuhn-Tucker optimality conditions for the NOTEARS cannot be satisfied except in a trivial case. Some combinations with local search are both more accurate and more efficient than the original NOTEARS.
arXiv Detail & Related papers (2020-10-18T22:59:37Z)
Calibration of Neural Networks using Splines [51.42640515410253]
Measuring calibration error amounts to comparing two empirical distributions. We introduce a binning-free calibration measure inspired by the classical Kolmogorov-Smirnov (KS) statistical test. Our method consistently outperforms existing methods on KS error as well as other commonly used calibration measures.
arXiv Detail & Related papers (2020-06-23T07:18:05Z)
Finite-sample Analysis of Interpolating Linear Classifiers in the Overparameterized Regime [16.1176305285103]
We prove bounds on the population risk of the maximum margin algorithm for two-class linear classification. We analyze this algorithm applied to random data including misclassification noise.
arXiv Detail & Related papers (2020-04-25T00:06:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.