Related papers: On the Error Resistance of Hinge Loss Minimization

On the Error Resistance of Hinge Loss Minimization

URL: http://arxiv.org/abs/2012.00989v1
Date: Wed, 2 Dec 2020 06:49:24 GMT
Title: On the Error Resistance of Hinge Loss Minimization
Authors: Kunal Talwar
Abstract summary: We identify a set of conditions on the data under which surrogate loss minimization algorithms provably learn the correct classifier. In particular, we show that if the data is linearly classifiable with a slightly non-trivial margin, then surrogate loss minimization has negligible error on the uncorrupted data.
Score: 30.808062097285706
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Commonly used classification algorithms in machine learning, such as support vector machines, minimize a convex surrogate loss on training examples. In practice, these algorithms are surprisingly robust to errors in the training data. In this work, we identify a set of conditions on the data under which such surrogate loss minimization algorithms provably learn the correct classifier. This allows us to establish, in a unified framework, the robustness of these algorithms under various models on data as well as error. In particular, we show that if the data is linearly classifiable with a slightly non-trivial margin (i.e. a margin at least $C/\sqrt{d}$ for $d$-dimensional unit vectors), and the class-conditional distributions are near isotropic and logconcave, then surrogate loss minimization has negligible error on the uncorrupted data even when a constant fraction of examples are adversarially mislabeled.

Related papers

Classification Error Bound for Low Bayes Error Conditions in Machine Learning [50.25063912757367]
We study the relationship between the error mismatch and the Kullback-Leibler divergence in machine learning. Motivated by recent observations of low model-based classification errors in many machine learning tasks, we propose a linear approximation of the classification error bound for low Bayes error conditions.
arXiv Detail & Related papers (2025-01-27T11:57:21Z)
A Mirror Descent-Based Algorithm for Corruption-Tolerant Distributed Gradient Descent [57.64826450787237]
We show how to analyze the behavior of distributed gradient descent algorithms in the presence of adversarial corruptions. We show how to use ideas from (lazy) mirror descent to design a corruption-tolerant distributed optimization algorithm. Experiments based on linear regression, support vector classification, and softmax classification on the MNIST dataset corroborate our theoretical findings.
arXiv Detail & Related papers (2024-07-19T08:29:12Z)
Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth [83.15263499262824]
We prove that gradient descent converges to a solution that completely disregards the sparse structure of the input. We show how to improve upon Gaussian performance for the compression of sparse data by adding a denoising function to a shallow architecture. We validate our findings on image datasets, such as CIFAR-10 and MNIST.
arXiv Detail & Related papers (2024-02-07T16:32:29Z)
Robust Online Covariance and Sparse Precision Estimation Under Arbitrary Data Corruption [1.5850859526672516]
We introduce a modified trimmed-inner-product algorithm to robustly estimate the covariance in an online scenario. We provide the error-bound and convergence properties of the estimates to the true precision matrix under our algorithms.
arXiv Detail & Related papers (2023-09-16T05:37:28Z)
Throwing Away Data Improves Worst-Class Error in Imbalanced Classification [36.91428748713018]
Class imbalances pervade classification problems, yet their treatment differs in theory and practice. We take on the challenge of developing learning theory able to describe the worst-class error of classifiers over linearly-separable data.
arXiv Detail & Related papers (2022-05-23T23:43:18Z)
The Interplay between Distribution Parameters and the Accuracy-Robustness Tradeoff in Classification [0.0]
Adrial training tends to result in models that are less accurate on natural (unperturbed) examples compared to standard models. This can be attributed to either an algorithmic shortcoming or a fundamental property of the training data distribution. In this work, we focus on the latter case under a binary Gaussian mixture classification problem.
arXiv Detail & Related papers (2021-07-01T06:57:50Z)
High-dimensional separability for one- and few-shot learning [58.8599521537]
This work is driven by a practical question, corrections of Artificial Intelligence (AI) errors. Special external devices, correctors, are developed. They should provide quick and non-iterative system fix without modification of a legacy AI system. New multi-correctors of AI systems are presented and illustrated with examples of predicting errors and learning new classes of objects by a deep convolutional neural network.
arXiv Detail & Related papers (2021-06-28T14:58:14Z)
Robustifying Binary Classification to Adversarial Perturbation [45.347651499585055]
In this paper we consider the problem of binary classification with adversarial perturbations. We introduce a generalization to the max-margin classifier which takes into account the power of the adversary in manipulating the data. Under some mild assumptions on the loss function, we theoretically show that the gradient descents converge to the RM classifier in its direction.
arXiv Detail & Related papers (2020-10-29T07:20:37Z)
Learning while Respecting Privacy and Robustness to Distributional Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model. The objective is to endow the trained model with robustness against adversarially manipulated input data. Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)
Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers. We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
Information-theoretic analysis for transfer learning [5.081241420920605]
We give an information-theoretic analysis on the generalization error and the excess risk of transfer learning algorithms. Our results suggest, perhaps as expected, that the Kullback-Leibler divergence $D(mu||mu')$ plays an important role in characterizing the generalization error.
arXiv Detail & Related papers (2020-05-18T13:23:20Z)
Robust and On-the-fly Dataset Denoising for Image Classification [72.10311040730815]
On-the-fly Data Denoising (ODD) is robust to mislabeled examples, while introducing almost zero computational overhead compared to standard training. ODD is able to achieve state-of-the-art results on a wide range of datasets including real-world ones such as WebVision and Clothing1M.
arXiv Detail & Related papers (2020-03-24T03:59:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.