Robust Learning under Strong Noise via SQs
- URL: http://arxiv.org/abs/2010.09106v1
- Date: Sun, 18 Oct 2020 21:02:26 GMT
- Title: Robust Learning under Strong Noise via SQs
- Authors: Ioannis Anagnostides, Themis Gouleakis, Ali Marashian
- Abstract summary: We show that every SQ learnable class admits an efficient learning algorithm with OPT + $epsilon misilon misclassification error for a broad class of noise models.
This setting substantially generalizes the widely-studied problem classification under RCN with known noise probabilities.
- Score: 5.9256596453465225
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work provides several new insights on the robustness of Kearns'
statistical query framework against challenging label-noise models. First, we
build on a recent result by \cite{DBLP:journals/corr/abs-2006-04787} that
showed noise tolerance of distribution-independently evolvable concept classes
under Massart noise. Specifically, we extend their characterization to more
general noise models, including the Tsybakov model which considerably
generalizes the Massart condition by allowing the flipping probability to be
arbitrarily close to $\frac{1}{2}$ for a subset of the domain. As a corollary,
we employ an evolutionary algorithm by \cite{DBLP:conf/colt/KanadeVV10} to
obtain the first polynomial time algorithm with arbitrarily small excess error
for learning linear threshold functions over any spherically symmetric
distribution in the presence of spherically symmetric Tsybakov noise. Moreover,
we posit access to a stronger oracle, in which for every labeled example we
additionally obtain its flipping probability. In this model, we show that every
SQ learnable class admits an efficient learning algorithm with OPT + $\epsilon$
misclassification error for a broad class of noise models. This setting
substantially generalizes the widely-studied problem of classification under
RCN with known noise rate, and corresponds to a non-convex optimization problem
even when the noise function -- i.e. the flipping probabilities of all points
-- is known in advance.
Related papers
- Robust Learning under Hybrid Noise [24.36707245704713]
We propose a novel unified learning framework called "Feature and Label Recovery" (FLR) to combat the hybrid noise from the perspective of data recovery.
arXiv Detail & Related papers (2024-07-04T16:13:25Z) - Information limits and Thouless-Anderson-Palmer equations for spiked matrix models with structured noise [19.496063739638924]
We consider a saturate problem of Bayesian inference for a structured spiked model.
We show how to predict the statistical limits using an efficient algorithm inspired by the theory of adaptive Thouless-Anderson-Palmer equations.
arXiv Detail & Related papers (2024-05-31T16:38:35Z) - On Convergence of Adam for Stochastic Optimization under Relaxed
Assumptions [4.9495085874952895]
Adaptive Momentum Estimation (Adam) algorithm is highly effective in various deep learning tasks.
We show that Adam can find a stationary point variance with a rate in high iterations under this general noise model.
arXiv Detail & Related papers (2024-02-06T13:19:26Z) - Generalization Error Bounds for Noisy, Iterative Algorithms via Maximal
Leakage [24.40306100502023]
We adopt an information-theoretic framework to analyze the generalization behavior of a class of noisy learning algorithms.
We show how the assumptions on the update function affect the optimal choice of the noise.
arXiv Detail & Related papers (2023-02-28T12:13:57Z) - Latent Class-Conditional Noise Model [54.56899309997246]
We introduce a Latent Class-Conditional Noise model (LCCN) to parameterize the noise transition under a Bayesian framework.
We then deduce a dynamic label regression method for LCCN, whose Gibbs sampler allows us efficiently infer the latent true labels.
Our approach safeguards the stable update of the noise transition, which avoids previous arbitrarily tuning from a mini-batch of samples.
arXiv Detail & Related papers (2023-02-19T15:24:37Z) - Clipped Stochastic Methods for Variational Inequalities with
Heavy-Tailed Noise [64.85879194013407]
We prove the first high-probability results with logarithmic dependence on the confidence level for methods for solving monotone and structured non-monotone VIPs.
Our results match the best-known ones in the light-tails case and are novel for structured non-monotone problems.
In addition, we numerically validate that the gradient noise of many practical formulations is heavy-tailed and show that clipping improves the performance of SEG/SGDA.
arXiv Detail & Related papers (2022-06-02T15:21:55Z) - The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
We show that deviating from this assumption can actually lead to better statistical estimators.
In particular, the optimal noise distribution is different from the data's and even from a different family.
arXiv Detail & Related papers (2022-03-02T13:59:20Z) - Boosting in the Presence of Massart Noise [49.72128048499074]
We study the problem of boosting the accuracy of a weak learner in the (distribution-independent) PAC model with Massart noise.
Our main result is the first computationally efficient boosting algorithm in the presence of Massart noise.
As a simple application of our positive result, we give first efficient Massart learner for unions of high-dimensional rectangles.
arXiv Detail & Related papers (2021-06-14T22:21:25Z) - Sample-Optimal PAC Learning of Halfspaces with Malicious Noise [4.8728183994912415]
We study efficient PAC learning of halfspaces in $mathRd$ in the presence of malicious noise of Valiant(1985)
We present a new analysis for the algorithm of Awasthi et al.( 2017) and show that it essentially achieves the near-optimal sample complexity bound of $tildeO(d)$.
We extend the algorithm and analysis to the more general and stronger nasty noise model of Bbbshoutyetal (2002), showing that it is still possible to achieve near-optimal noise tolerance and sample complexity in time.
arXiv Detail & Related papers (2021-02-11T20:18:20Z) - Shape Matters: Understanding the Implicit Bias of the Noise Covariance [76.54300276636982]
Noise in gradient descent provides a crucial implicit regularization effect for training over parameterized models.
We show that parameter-dependent noise -- induced by mini-batches or label perturbation -- is far more effective than Gaussian noise.
Our analysis reveals that parameter-dependent noise introduces a bias towards local minima with smaller noise variance, whereas spherical Gaussian noise does not.
arXiv Detail & Related papers (2020-06-15T18:31:02Z) - Learning Halfspaces with Tsybakov Noise [50.659479930171585]
We study the learnability of halfspaces in the presence of Tsybakov noise.
We give an algorithm that achieves misclassification error $epsilon$ with respect to the true halfspace.
arXiv Detail & Related papers (2020-06-11T14:25:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.