Related papers: Sample-Optimal PAC Learning of Halfspaces with Malicious Noise

Sample-Optimal PAC Learning of Halfspaces with Malicious Noise

URL: http://arxiv.org/abs/2102.06247v1
Date: Thu, 11 Feb 2021 20:18:20 GMT
Title: Sample-Optimal PAC Learning of Halfspaces with Malicious Noise
Authors: Jie Shen
Abstract summary: We study efficient PAC learning of halfspaces in $mathRd$ in the presence of malicious noise of Valiant(1985) We present a new analysis for the algorithm of Awasthi et al.( 2017) and show that it essentially achieves the near-optimal sample complexity bound of $tildeO(d)$. We extend the algorithm and analysis to the more general and stronger nasty noise model of Bbbshoutyetal (2002), showing that it is still possible to achieve near-optimal noise tolerance and sample complexity in time.
Score: 4.8728183994912415
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study efficient PAC learning of homogeneous halfspaces in $\mathbb{R}^d$ in the presence of malicious noise of Valiant~(1985). This is a challenging noise model and only until recently has near-optimal noise tolerance bound been established under the mild condition that the unlabeled data distribution is isotropic log-concave. However, it remains unsettled how to obtain the optimal sample complexity simultaneously. In this work, we present a new analysis for the algorithm of Awasthi et al.~(2017) and show that it essentially achieves the near-optimal sample complexity bound of $\tilde{O}(d)$, improving the best known result of $\tilde{O}(d^2)$. Our main ingredient is a novel incorporation of a Matrix Chernoff-type inequality to bound the spectrum of an empirical covariance matrix for well-behaved distributions, in conjunction with a careful exploration of the localization schemes of Awasthi et al.~(2017). We further extend the algorithm and analysis to the more general and stronger nasty noise model of Bshouty~et~al. (2002), showing that it is still possible to achieve near-optimal noise tolerance and sample complexity in polynomial time.

Related papers

Faster Sampling via Stochastic Gradient Proximal Sampler [28.422547264326468]
Proximal samplers (SPS) for sampling from non-log-concave distributions are studied. We show that the convergence to the target distribution can be guaranteed as long as the algorithm trajectory is bounded. We provide two implementable variants based on Langevin dynamics (SGLD) and Langevin-MALA, giving rise to SPS-SGLD and SPS-MALA.
arXiv Detail & Related papers (2024-05-27T00:53:18Z)
Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems [56.86067111855056]
We consider clipped optimization problems with heavy-tailed noise with structured density. We show that it is possible to get faster rates of convergence than $mathcalO(K-(alpha - 1)/alpha)$, when the gradients have finite moments of order. We prove that the resulting estimates have negligible bias and controllable variance.
arXiv Detail & Related papers (2023-11-07T17:39:17Z)
Improved dimension dependence of a proximal algorithm for sampling [16.147290924171692]
We propose a sampling algorithm that achieves superior complexity bounds in all the classical settings. Our algorithm is based on the proximal sampler introduced incitetlee 2021. For strongly log-concave distributions, our method has complexity bound $tildemathcalO(kappa d1/2)$ without warm start. For distributions satisfying the LSI, our bound is $tilde mathcalO(hat kappa d1/2)$ where $hat kappa$ is the ratio between smoothness and
arXiv Detail & Related papers (2023-02-20T16:44:48Z)
Optimal Algorithms for Stochastic Complementary Composite Minimization [55.26935605535377]
Inspired by regularization techniques in statistics and machine learning, we study complementary composite minimization. We provide novel excess risk bounds, both in expectation and with high probability. Our algorithms are nearly optimal, which we prove via novel lower complexity bounds for this class of problems.
arXiv Detail & Related papers (2022-11-03T12:40:24Z)
Clipped Stochastic Methods for Variational Inequalities with Heavy-Tailed Noise [64.85879194013407]
We prove the first high-probability results with logarithmic dependence on the confidence level for methods for solving monotone and structured non-monotone VIPs. Our results match the best-known ones in the light-tails case and are novel for structured non-monotone problems. In addition, we numerically validate that the gradient noise of many practical formulations is heavy-tailed and show that clipping improves the performance of SEG/SGDA.
arXiv Detail & Related papers (2022-06-02T15:21:55Z)
Optimizing Information-theoretical Generalization Bounds via Anisotropic Noise in SGLD [73.55632827932101]
We optimize the information-theoretical generalization bound by manipulating the noise structure in SGLD. We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance.
arXiv Detail & Related papers (2021-10-26T15:02:27Z)
Forster Decomposition and Learning Halfspaces with Noise [60.691817861402676]
A Forster transform is an operation that turns a distribution into one with good anti-concentration properties. We show that any distribution can be decomposed efficiently as a disjoint mixture of few distributions for which a Forster transform exists and can be computed efficiently.
arXiv Detail & Related papers (2021-07-12T17:00:59Z)
Towards Sample-Optimal Compressive Phase Retrieval with Sparse and Generative Priors [59.33977545294148]
We show that $O(k log L)$ samples suffice to guarantee that the signal is close to any vector that minimizes an amplitude-based empirical loss function. We adapt this result to sparse phase retrieval, and show that $O(s log n)$ samples are sufficient for a similar guarantee when the underlying signal is $s$-sparse and $n$-dimensional.
arXiv Detail & Related papers (2021-06-29T12:49:54Z)
Improved Algorithms for Efficient Active Learning Halfspaces with Massart and Tsybakov noise [29.890039126644776]
We develop a PAC active learning algorithm for $d$-dimensional homogeneous halfspaces that can tolerate Massart noisecitepmassart2006risk and Tsybakov noiseciteptsybakov2004. Under the more challenging Tsybakov noise condition, we identify two subfamilies of noise conditions, under which our algorithm achieves computational efficiency and provide label complexity guarantees strictly lower than passive learning algorithms.
arXiv Detail & Related papers (2021-02-10T08:17:17Z)
Learning Halfspaces with Tsybakov Noise [50.659479930171585]
We study the learnability of halfspaces in the presence of Tsybakov noise. We give an algorithm that achieves misclassification error $epsilon$ with respect to the true halfspace.
arXiv Detail & Related papers (2020-06-11T14:25:02Z)
Attribute-Efficient Learning of Halfspaces with Malicious Noise: Near-Optimal Label Complexity and Noise Tolerance [21.76197397540397]
This paper is concerned with computationally efficient learning of homogeneous sparse halfspaces in $mathbbRd$ under noise. We show that the sample complexity is $tildeObig(frac 1 epsilon s2 log5 d big)$ which also enjoys the attribute efficiency.
arXiv Detail & Related papers (2020-06-06T04:57:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.