Finite-sample Analysis of Interpolating Linear Classifiers in the
Overparameterized Regime
- URL: http://arxiv.org/abs/2004.12019v4
- Date: Tue, 1 Jun 2021 18:03:02 GMT
- Title: Finite-sample Analysis of Interpolating Linear Classifiers in the
Overparameterized Regime
- Authors: Niladri S. Chatterji, Philip M. Long
- Abstract summary: We prove bounds on the population risk of the maximum margin algorithm for two-class linear classification.
We analyze this algorithm applied to random data including misclassification noise.
- Score: 16.1176305285103
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We prove bounds on the population risk of the maximum margin algorithm for
two-class linear classification. For linearly separable training data, the
maximum margin algorithm has been shown in previous work to be equivalent to a
limit of training with logistic loss using gradient descent, as the training
error is driven to zero. We analyze this algorithm applied to random data
including misclassification noise. Our assumptions on the clean data include
the case in which the class-conditional distributions are standard normal
distributions. The misclassification noise may be chosen by an adversary,
subject to a limit on the fraction of corrupted labels. Our bounds show that,
with sufficient over-parameterization, the maximum margin algorithm trained on
noisy data can achieve nearly optimal population risk.
Related papers
- Benign Overfitting in Linear Classifiers and Leaky ReLU Networks from
KKT Conditions for Margin Maximization [59.038366742773164]
Linears and leaky ReLU trained by gradient flow on logistic loss have an implicit bias towards satisfying the Karush-KuTucker (KKT) conditions.
In this work we establish a number of settings where the satisfaction of these conditions implies benign overfitting in linear classifiers and in two-layer leaky ReLU networks.
arXiv Detail & Related papers (2023-03-02T18:24:26Z) - Generalization Error Bounds for Noisy, Iterative Algorithms via Maximal
Leakage [24.40306100502023]
We adopt an information-theoretic framework to analyze the generalization behavior of a class of noisy learning algorithms.
We show how the assumptions on the update function affect the optimal choice of the noise.
arXiv Detail & Related papers (2023-02-28T12:13:57Z) - Self-Certifying Classification by Linearized Deep Assignment [65.0100925582087]
We propose a novel class of deep predictors for classifying metric data on graphs within PAC-Bayes risk certification paradigm.
Building on the recent PAC-Bayes literature and data-dependent priors, this approach enables learning posterior distributions on the hypothesis space.
arXiv Detail & Related papers (2022-01-26T19:59:14Z) - Risk Bounds for Over-parameterized Maximum Margin Classification on
Sub-Gaussian Mixtures [100.55816326422773]
We study the phenomenon of the maximum margin classifier for linear classification problems.
Our results precisely characterize the condition under which benign overfitting can occur.
arXiv Detail & Related papers (2021-04-28T08:25:16Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Classification with abstention but without disparities [5.025654873456756]
We build a general purpose classification algorithm, which is able to abstain from prediction, while avoiding disparate impact.
We establish finite sample risk, fairness, and abstention guarantees for the proposed algorithm.
Our method empirically shows that moderate abstention rates allow to bypass the risk-fairness trade-off.
arXiv Detail & Related papers (2021-02-24T12:43:55Z) - Learning Noise Transition Matrix from Only Noisy Labels via Total
Variation Regularization [88.91872713134342]
We propose a theoretically grounded method that can estimate the noise transition matrix and learn a classifier simultaneously.
We show the effectiveness of the proposed method through experiments on benchmark and real-world datasets.
arXiv Detail & Related papers (2021-02-04T05:09:18Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z) - A termination criterion for stochastic gradient descent for binary
classification [3.216356957608319]
We propose a new, simple, and inexpensive termination test for constant step-size gradient descent.
We show that our test terminates in a finite number of iterations and when the noise in the data is not too large, the expected classifier at termination nearly minimizes the probability of misclassification.
arXiv Detail & Related papers (2020-03-23T15:00:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.