On the Error Resistance of Hinge Loss Minimization
- URL: http://arxiv.org/abs/2012.00989v1
- Date: Wed, 2 Dec 2020 06:49:24 GMT
- Title: On the Error Resistance of Hinge Loss Minimization
- Authors: Kunal Talwar
- Abstract summary: We identify a set of conditions on the data under which surrogate loss minimization algorithms provably learn the correct classifier.
In particular, we show that if the data is linearly classifiable with a slightly non-trivial margin, then surrogate loss minimization has negligible error on the uncorrupted data.
- Score: 30.808062097285706
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Commonly used classification algorithms in machine learning, such as support
vector machines, minimize a convex surrogate loss on training examples. In
practice, these algorithms are surprisingly robust to errors in the training
data. In this work, we identify a set of conditions on the data under which
such surrogate loss minimization algorithms provably learn the correct
classifier. This allows us to establish, in a unified framework, the robustness
of these algorithms under various models on data as well as error. In
particular, we show that if the data is linearly classifiable with a slightly
non-trivial margin (i.e. a margin at least $C/\sqrt{d}$ for $d$-dimensional
unit vectors), and the class-conditional distributions are near isotropic and
logconcave, then surrogate loss minimization has negligible error on the
uncorrupted data even when a constant fraction of examples are adversarially
mislabeled.
Related papers
- A Mirror Descent-Based Algorithm for Corruption-Tolerant Distributed Gradient Descent [57.64826450787237]
We show how to analyze the behavior of distributed gradient descent algorithms in the presence of adversarial corruptions.
We show how to use ideas from (lazy) mirror descent to design a corruption-tolerant distributed optimization algorithm.
Experiments based on linear regression, support vector classification, and softmax classification on the MNIST dataset corroborate our theoretical findings.
arXiv Detail & Related papers (2024-07-19T08:29:12Z) - Compression of Structured Data with Autoencoders: Provable Benefit of
Nonlinearities and Depth [83.15263499262824]
We prove that gradient descent converges to a solution that completely disregards the sparse structure of the input.
We show how to improve upon Gaussian performance for the compression of sparse data by adding a denoising function to a shallow architecture.
We validate our findings on image datasets, such as CIFAR-10 and MNIST.
arXiv Detail & Related papers (2024-02-07T16:32:29Z) - Robust Online Covariance and Sparse Precision Estimation Under Arbitrary
Data Corruption [1.5850859526672516]
We introduce a modified trimmed-inner-product algorithm to robustly estimate the covariance in an online scenario.
We provide the error-bound and convergence properties of the estimates to the true precision matrix under our algorithms.
arXiv Detail & Related papers (2023-09-16T05:37:28Z) - Throwing Away Data Improves Worst-Class Error in Imbalanced
Classification [36.91428748713018]
Class imbalances pervade classification problems, yet their treatment differs in theory and practice.
We take on the challenge of developing learning theory able to describe the worst-class error of classifiers over linearly-separable data.
arXiv Detail & Related papers (2022-05-23T23:43:18Z) - The Interplay between Distribution Parameters and the
Accuracy-Robustness Tradeoff in Classification [0.0]
Adrial training tends to result in models that are less accurate on natural (unperturbed) examples compared to standard models.
This can be attributed to either an algorithmic shortcoming or a fundamental property of the training data distribution.
In this work, we focus on the latter case under a binary Gaussian mixture classification problem.
arXiv Detail & Related papers (2021-07-01T06:57:50Z) - High-dimensional separability for one- and few-shot learning [58.8599521537]
This work is driven by a practical question, corrections of Artificial Intelligence (AI) errors.
Special external devices, correctors, are developed. They should provide quick and non-iterative system fix without modification of a legacy AI system.
New multi-correctors of AI systems are presented and illustrated with examples of predicting errors and learning new classes of objects by a deep convolutional neural network.
arXiv Detail & Related papers (2021-06-28T14:58:14Z) - Robustifying Binary Classification to Adversarial Perturbation [45.347651499585055]
In this paper we consider the problem of binary classification with adversarial perturbations.
We introduce a generalization to the max-margin classifier which takes into account the power of the adversary in manipulating the data.
Under some mild assumptions on the loss function, we theoretically show that the gradient descents converge to the RM classifier in its direction.
arXiv Detail & Related papers (2020-10-29T07:20:37Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Information-theoretic analysis for transfer learning [5.081241420920605]
We give an information-theoretic analysis on the generalization error and the excess risk of transfer learning algorithms.
Our results suggest, perhaps as expected, that the Kullback-Leibler divergence $D(mu||mu')$ plays an important role in characterizing the generalization error.
arXiv Detail & Related papers (2020-05-18T13:23:20Z) - Robust and On-the-fly Dataset Denoising for Image Classification [72.10311040730815]
On-the-fly Data Denoising (ODD) is robust to mislabeled examples, while introducing almost zero computational overhead compared to standard training.
ODD is able to achieve state-of-the-art results on a wide range of datasets including real-world ones such as WebVision and Clothing1M.
arXiv Detail & Related papers (2020-03-24T03:59:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.