Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and
ReLUs under Gaussian Marginals
- URL: http://arxiv.org/abs/2006.16200v1
- Date: Mon, 29 Jun 2020 17:10:10 GMT
- Title: Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and
ReLUs under Gaussian Marginals
- Authors: Ilias Diakonikolas, Daniel M. Kane, Nikos Zarifis
- Abstract summary: We study the fundamental problems of agnostically learning halfspaces and ReLUs under Gaussian marginals.
Our lower bounds provide strong evidence that current upper bounds for these tasks are essentially best possible.
- Score: 49.60752558064027
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the fundamental problems of agnostically learning halfspaces and
ReLUs under Gaussian marginals. In the former problem, given labeled examples
$(\mathbf{x}, y)$ from an unknown distribution on $\mathbb{R}^d \times \{ \pm
1\}$, whose marginal distribution on $\mathbf{x}$ is the standard Gaussian and
the labels $y$ can be arbitrary, the goal is to output a hypothesis with 0-1
loss $\mathrm{OPT}+\epsilon$, where $\mathrm{OPT}$ is the 0-1 loss of the
best-fitting halfspace. In the latter problem, given labeled examples
$(\mathbf{x}, y)$ from an unknown distribution on $\mathbb{R}^d \times
\mathbb{R}$, whose marginal distribution on $\mathbf{x}$ is the standard
Gaussian and the labels $y$ can be arbitrary, the goal is to output a
hypothesis with square loss $\mathrm{OPT}+\epsilon$, where $\mathrm{OPT}$ is
the square loss of the best-fitting ReLU. We prove Statistical Query (SQ) lower
bounds of $d^{\mathrm{poly}(1/\epsilon)}$ for both of these problems. Our SQ
lower bounds provide strong evidence that current upper bounds for these tasks
are essentially best possible.
Related papers
- Near-Optimal Cryptographic Hardness of Agnostically Learning Halfspaces
and ReLU Regression under Gaussian Marginals [43.0867217287089]
We study the task of agnostically learning halfspaces under the Gaussian distribution.
We prove a near-optimal computational hardness result for this task.
arXiv Detail & Related papers (2023-02-13T16:46:23Z) - SQ Lower Bounds for Learning Single Neurons with Massart Noise [40.1662767099183]
PAC learning a single neuron in the presence of Massart noise.
We prove that no efficient SQ algorithm can approximate the optimal error within any constant factor.
arXiv Detail & Related papers (2022-10-18T15:58:00Z) - Cryptographic Hardness of Learning Halfspaces with Massart Noise [59.8587499110224]
We study the complexity of PAC learning halfspaces in the presence of Massart noise.
We show that no-time Massart halfspace learners can achieve error better than $Omega(eta)$, even if the optimal 0-1 error is small.
arXiv Detail & Related papers (2022-07-28T17:50:53Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Beyond Independent Measurements: General Compressed Sensing with GNN
Application [4.924126492174801]
We consider the problem of recovering a structured signal $mathbfx in mathbbRn$ from noisy cone observations.
We show that the effective rank of $mathbfB$ may be used as a surrogate for the number of measurements.
arXiv Detail & Related papers (2021-10-30T20:35:56Z) - Threshold Phenomena in Learning Halfspaces with Massart Noise [56.01192577666607]
We study the problem of PAC learning halfspaces on $mathbbRd$ with Massart noise under Gaussian marginals.
Our results qualitatively characterize the complexity of learning halfspaces in the Massart model.
arXiv Detail & Related papers (2021-08-19T16:16:48Z) - Efficient Statistics for Sparse Graphical Models from Truncated Samples [19.205541380535397]
We focus on two fundamental and classical problems: (i) inference of sparse Gaussian graphical models and (ii) support recovery of sparse linear models.
For sparse linear regression, suppose samples $(bf x,y)$ are generated where $y = bf xtopOmega* + mathcalN(0,1)$ and $(bf x, y)$ is seen only if $y$ belongs to a truncation set $S subseteq mathbbRd$.
arXiv Detail & Related papers (2020-06-17T09:21:00Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.