Agnostic Learning of a Single Neuron with Gradient Descent
- URL: http://arxiv.org/abs/2005.14426v3
- Date: Mon, 31 Aug 2020 17:19:08 GMT
- Title: Agnostic Learning of a Single Neuron with Gradient Descent
- Authors: Spencer Frei and Yuan Cao and Quanquan Gu
- Abstract summary: We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
- Score: 92.7662890047311
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of learning the best-fitting single neuron as
measured by the expected square loss $\mathbb{E}_{(x,y)\sim
\mathcal{D}}[(\sigma(w^\top x)-y)^2]$ over some unknown joint distribution
$\mathcal{D}$ by using gradient descent to minimize the empirical risk induced
by a set of i.i.d. samples $S\sim \mathcal{D}^n$. The activation function
$\sigma$ is an arbitrary Lipschitz and non-decreasing function, making the
optimization problem nonconvex and nonsmooth in general, and covers typical
neural network activation functions and inverse link functions in the
generalized linear model setting. In the agnostic PAC learning setting, where
no assumption on the relationship between the labels $y$ and the input $x$ is
made, if the optimal population risk is $\mathsf{OPT}$, we show that gradient
descent achieves population risk $O(\mathsf{OPT})+\epsilon$ in polynomial time
and sample complexity when $\sigma$ is strictly increasing. For the ReLU
activation, our population risk guarantee is $O(\mathsf{OPT}^{1/2})+\epsilon$.
When labels take the form $y = \sigma(v^\top x) + \xi$ for zero-mean
sub-Gaussian noise $\xi$, we show that the population risk guarantees for
gradient descent improve to $\mathsf{OPT} + \epsilon$. Our sample complexity
and runtime guarantees are (almost) dimension independent, and when $\sigma$ is
strictly increasing, require no distributional assumptions beyond boundedness.
For ReLU, we show the same results under a nondegeneracy assumption for the
marginal distribution of the input.
Related papers
- Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - Iterative thresholding for non-linear learning in the strong $\varepsilon$-contamination model [3.309767076331365]
We derive approximation bounds for learning single neuron models using thresholded descent.
We also study the linear regression problem, where $sigma(mathbfx) = mathbfx$.
arXiv Detail & Related papers (2024-09-05T16:59:56Z) - Sharp Rates in Dependent Learning Theory: Avoiding Sample Size Deflation for the Square Loss [33.18537822803389]
We show that whenever the topologies of $L2$ and $Psi_p$ are comparable on our hypothesis class $mathscrF$, $mathscrF$ is a weakly sub-Gaussian class.
Our result holds whether the problem is realizable or not and we refer to this as a emphnear mixing-free rate, since direct dependence on mixing is relegated to an additive higher order term.
arXiv Detail & Related papers (2024-02-08T18:57:42Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - A Fast Algorithm for Adaptive Private Mean Estimation [5.090363690988394]
We design an $(varepsilon, delta)$-differentially private algorithm that is adaptive to $Sigma$.
The estimator achieves optimal rates of convergence with respect to the induced Mahalanobis norm $||cdot||_Sigma$.
arXiv Detail & Related papers (2023-01-17T18:44:41Z) - Sparse Signal Detection in Heteroscedastic Gaussian Sequence Models:
Sharp Minimax Rates [1.0309387309011746]
We study the signal detection problem against sparse alternatives, for known sparsity $s$.
We find minimax upper and lower bounds over the minimax separation radius $epsilon*$ and prove that they are always matching.
Our results reveal new phase transitions regarding the behavior of $epsilon*$ with respect to the level of sparsity, to the $Lt$ metric, and to the heteroscedasticity profile of $Sigma$.
arXiv Detail & Related papers (2022-11-15T23:53:39Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and
ReLUs under Gaussian Marginals [49.60752558064027]
We study the fundamental problems of agnostically learning halfspaces and ReLUs under Gaussian marginals.
Our lower bounds provide strong evidence that current upper bounds for these tasks are essentially best possible.
arXiv Detail & Related papers (2020-06-29T17:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.