Threshold Phenomena in Learning Halfspaces with Massart Noise
- URL: http://arxiv.org/abs/2108.08767v1
- Date: Thu, 19 Aug 2021 16:16:48 GMT
- Title: Threshold Phenomena in Learning Halfspaces with Massart Noise
- Authors: Ilias Diakonikolas, Daniel M. Kane, Vasilis Kontonis, Christos Tzamos,
Nikos Zarifis
- Abstract summary: We study the problem of PAC learning halfspaces on $mathbbRd$ with Massart noise under Gaussian marginals.
Our results qualitatively characterize the complexity of learning halfspaces in the Massart model.
- Score: 56.01192577666607
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of PAC learning halfspaces on $\mathbb{R}^d$ with
Massart noise under Gaussian marginals. In the Massart noise model, an
adversary is allowed to flip the label of each point $\mathbf{x}$ with
probability $\eta(\mathbf{x}) \leq \eta$, for some parameter $\eta \in
[0,1/2]$.
The goal of the learner is to output a hypothesis with missclassification
error $\mathrm{opt} + \epsilon$, where $\mathrm{opt}$ is the error of the
target halfspace. Prior work studied this problem assuming that the target
halfspace is homogeneous and that the parameter $\eta$ is strictly smaller than
$1/2$. We explore how the complexity of the problem changes when either of
these assumptions is removed, establishing the following threshold phenomena:
For $\eta = 1/2$, we prove a lower bound of $d^{\Omega (\log(1/\epsilon))}$
on the complexity of any Statistical Query (SQ) algorithm for the problem,
which holds even for homogeneous halfspaces. On the positive side, we give a
new learning algorithm for arbitrary halfspaces in this regime with sample
complexity and running time $O_\epsilon(1) \, d^{O(\log(1/\epsilon))}$.
For $\eta <1/2$, we establish a lower bound of $d^{\Omega(\log(1/\gamma))}$
on the SQ complexity of the problem, where $\gamma = \max\{\epsilon,
\min\{\mathbf{Pr}[f(\mathbf{x}) = 1], \mathbf{Pr}[f(\mathbf{x}) = -1]\} \}$ and
$f$ is the target halfspace. In particular, this implies an SQ lower bound of
$d^{\Omega (\log(1/\epsilon) )}$ for learning arbitrary Massart halfspaces
(even for small constant $\eta$). We complement this lower bound with a new
learning algorithm for this regime with sample complexity and runtime
$d^{O_{\eta}(\log(1/\gamma))} \mathrm{poly}(1/\epsilon)$.
Taken together, our results qualitatively characterize the complexity of
learning halfspaces in the Massart model.
Related papers
- SQ Lower Bounds for Learning Mixtures of Linear Classifiers [43.63696593768504]
We show that known algorithms for this problem are essentially best possible, even for the special case of uniform mixtures.
The key technical ingredient is a new construction of spherical designs that may be of independent interest.
arXiv Detail & Related papers (2023-10-18T10:56:57Z) - Near-Optimal Bounds for Learning Gaussian Halfspaces with Random
Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$.
Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z) - Cryptographic Hardness of Learning Halfspaces with Massart Noise [59.8587499110224]
We study the complexity of PAC learning halfspaces in the presence of Massart noise.
We show that no-time Massart halfspace learners can achieve error better than $Omega(eta)$, even if the optimal 0-1 error is small.
arXiv Detail & Related papers (2022-07-28T17:50:53Z) - Optimal SQ Lower Bounds for Learning Halfspaces with Massart Noise [9.378684220920562]
tightest statistical query (SQ) lower bounds for learnining halfspaces in the presence of Massart noise.
We show that for arbitrary $eta in [0,1/2]$ every SQ algorithm achieving misclassification error better than $eta$ requires queries of superpolynomial accuracy.
arXiv Detail & Related papers (2022-01-24T17:33:19Z) - Hardness of Learning Halfspaces with Massart Noise [56.98280399449707]
We study the complexity of PAC learning halfspaces in the presence of Massart (bounded) noise.
We show that there an exponential gap between the information-theoretically optimal error and the best error that can be achieved by a SQ algorithm.
arXiv Detail & Related papers (2020-12-17T16:43:11Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z) - On the Complexity of Minimizing Convex Finite Sums Without Using the
Indices of the Individual Functions [62.01594253618911]
We exploit the finite noise structure of finite sums to derive a matching $O(n2)$-upper bound under the global oracle model.
Following a similar approach, we propose a novel adaptation of SVRG which is both emphcompatible with oracles, and achieves complexity bounds of $tildeO(n2+nsqrtL/mu)log (1/epsilon)$ and $O(nsqrtL/epsilon)$, for $mu>0$ and $mu=0$
arXiv Detail & Related papers (2020-02-09T03:39:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.