Efficient active learning of sparse halfspaces with arbitrary bounded
noise
- URL: http://arxiv.org/abs/2002.04840v3
- Date: Fri, 13 Aug 2021 08:18:06 GMT
- Title: Efficient active learning of sparse halfspaces with arbitrary bounded
noise
- Authors: Chicheng Zhang and Jie Shen and Pranjal Awasthi
- Abstract summary: We study active learning of homogeneous $s$-sparse halfspaces in $mathbbRd$ under the setting where the unlabeled data distribution is isotropic log-concave.
Under high levels of label noise, the label complexity bounds achieved by computationally efficient algorithms are much worse.
This is the first efficient algorithm with label complexity in $frac11-2eta$ in this setting, which is label-efficient even for $eta$ arbitrarily close to $frac12$.
- Score: 34.406103025985445
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study active learning of homogeneous $s$-sparse halfspaces in
$\mathbb{R}^d$ under the setting where the unlabeled data distribution is
isotropic log-concave and each label is flipped with probability at most $\eta$
for a parameter $\eta \in \big[0, \frac12\big)$, known as the bounded noise.
Even in the presence of mild label noise, i.e. $\eta$ is a small constant, this
is a challenging problem and only recently have label complexity bounds of the
form $\tilde{O}\big(s \cdot \mathrm{polylog}(d, \frac{1}{\epsilon})\big)$ been
established in [Zhang, 2018] for computationally efficient algorithms. In
contrast, under high levels of label noise, the label complexity bounds
achieved by computationally efficient algorithms are much worse: the best known
result of [Awasthi et al., 2016] provides a computationally efficient algorithm
with label complexity $\tilde{O}\big((\frac{s \ln
d}{\epsilon})^{2^{\mathrm{poly}(1/(1-2\eta))}} \big)$, which is label-efficient
only when the noise rate $\eta$ is a fixed constant. In this work, we
substantially improve on it by designing a polynomial time algorithm for active
learning of $s$-sparse halfspaces, with a label complexity of
$\tilde{O}\big(\frac{s}{(1-2\eta)^4} \mathrm{polylog} (d, \frac 1 \epsilon)
\big)$. This is the first efficient algorithm with label complexity polynomial
in $\frac{1}{1-2\eta}$ in this setting, which is label-efficient even for
$\eta$ arbitrarily close to $\frac12$. Our active learning algorithm and its
theoretical guarantees also immediately translate to new state-of-the-art label
and sample complexity results for full-dimensional active and passive halfspace
learning under arbitrary bounded noise. The key insight of our algorithm and
analysis is a new interpretation of online learning regret inequalities, which
may be of independent interest.
Related papers
- A Near-optimal Algorithm for Learning Margin Halfspaces with Massart Noise [36.29182619215646]
We study the problem of PAC learning $gamma$-margin halfspaces in the presence of Massart noise.
Our algorithm is simple and practical, relying on online SGD on a carefully selected sequence of convex losses.
arXiv Detail & Related papers (2025-01-16T17:44:18Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Near-Optimal Bounds for Learning Gaussian Halfspaces with Random
Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$.
Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Threshold Phenomena in Learning Halfspaces with Massart Noise [56.01192577666607]
We study the problem of PAC learning halfspaces on $mathbbRd$ with Massart noise under Gaussian marginals.
Our results qualitatively characterize the complexity of learning halfspaces in the Massart model.
arXiv Detail & Related papers (2021-08-19T16:16:48Z) - On the Power of Localized Perceptron for Label-Optimal Learning of
Halfspaces with Adversarial Noise [4.8728183994912415]
We study em online active learning of homogeneous halfspaces in $mathbbRd$ with adversarial noise.
Our main contribution is a Perceptron-like active learning algorithm that runs in time.
Our algorithm learns the underlying halfspace with near-optimal label complexity.
arXiv Detail & Related papers (2020-12-19T22:04:10Z) - Hardness of Learning Halfspaces with Massart Noise [56.98280399449707]
We study the complexity of PAC learning halfspaces in the presence of Massart (bounded) noise.
We show that there an exponential gap between the information-theoretically optimal error and the best error that can be achieved by a SQ algorithm.
arXiv Detail & Related papers (2020-12-17T16:43:11Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z) - Attribute-Efficient Learning of Halfspaces with Malicious Noise:
Near-Optimal Label Complexity and Noise Tolerance [21.76197397540397]
This paper is concerned with computationally efficient learning of homogeneous sparse halfspaces in $mathbbRd$ under noise.
We show that the sample complexity is $tildeObig(frac 1 epsilon s2 log5 d big)$ which also enjoys the attribute efficiency.
arXiv Detail & Related papers (2020-06-06T04:57:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.