Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample
Complexity for Learning Single Index Models
- URL: http://arxiv.org/abs/2305.10633v1
- Date: Thu, 18 May 2023 01:10:11 GMT
- Title: Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample
Complexity for Learning Single Index Models
- Authors: Alex Damian, Eshaan Nichani, Rong Ge, Jason D. Lee
- Abstract summary: We focus on the task of learning a single index model $sigma(wstar cdot x)$ with respect to the isotropic Gaussian distribution in $d$ dimensions.
We show that online SGD on a smoothed loss learns $wstar$ with $n gtrsim dkstar/2$ samples.
- Score: 43.83997656986799
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We focus on the task of learning a single index model $\sigma(w^\star \cdot
x)$ with respect to the isotropic Gaussian distribution in $d$ dimensions.
Prior work has shown that the sample complexity of learning $w^\star$ is
governed by the information exponent $k^\star$ of the link function $\sigma$,
which is defined as the index of the first nonzero Hermite coefficient of
$\sigma$. Ben Arous et al. (2021) showed that $n \gtrsim d^{k^\star-1}$ samples
suffice for learning $w^\star$ and that this is tight for online SGD. However,
the CSQ lower bound for gradient based methods only shows that $n \gtrsim
d^{k^\star/2}$ samples are necessary. In this work, we close the gap between
the upper and lower bounds by showing that online SGD on a smoothed loss learns
$w^\star$ with $n \gtrsim d^{k^\star/2}$ samples. We also draw connections to
statistical analyses of tensor PCA and to the implicit regularization effects
of minibatch SGD on empirical losses.
Related papers
- Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ under isotropic Gaussian data.
We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ of arbitrary link function with a sample and runtime complexity of $n asymp T asymp C(q) cdot d
arXiv Detail & Related papers (2024-06-03T17:56:58Z) - Noise Stability Optimization for Flat Minima with Tight Rates [18.009376840944284]
We show how to minimize a function $F(W) = mathbbE_U[f(W + U)]$, given a function $f: mathbbRd rightarrow mathbbR$ and a random sample $U$ from a distribution $mathcalP$ with mean zero.
We design a simple, practical algorithm that adds noise along both $U$ and $-U$, with the option of adding several perturbeds and taking their average.
arXiv Detail & Related papers (2023-06-14T14:58:36Z) - Detection of Dense Subhypergraphs by Low-Degree Polynomials [72.4451045270967]
Detection of a planted dense subgraph in a random graph is a fundamental statistical and computational problem.
We consider detecting the presence of a planted $Gr(ngamma, n-alpha)$ subhypergraph in a $Gr(n, n-beta) hypergraph.
Our results are already new in the graph case $r=2$, as we consider the subtle log-density regime where hardness based on average-case reductions is not known.
arXiv Detail & Related papers (2023-04-17T10:38:08Z) - Structure Learning in Graphical Models from Indirect Observations [17.521712510832558]
This paper considers learning of the graphical structure of a $p$-dimensional random vector $X in Rp$ using both parametric and non-parametric methods.
Under mild conditions, we show that our graph-structure estimator can obtain the correct structure.
arXiv Detail & Related papers (2022-05-06T19:24:44Z) - On the Power of Multitask Representation Learning in Linear MDP [61.58929164172968]
This paper presents analyses for the statistical benefit of multitask representation learning in linear Markov Decision Process (MDP)
We first discover a emphLeast-Activated-Feature-Abundance (LAFA) criterion, denoted as $kappa$, with which we prove that a straightforward least-square algorithm learns a policy which is $tildeO(H2sqrtfrackappa mathcalC(Phi)2 kappa dNT+frackappa dn)
arXiv Detail & Related papers (2021-06-15T11:21:06Z) - Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs
with a Generative Model [3.749193647980305]
This paper considers a Markov decision process (MDP) that admits a set of state-action features.
We show that a model-based approach (resp.$$Q-learning) provably learns an $varepsilon$-optimal policy with high probability.
arXiv Detail & Related papers (2021-05-28T17:49:39Z) - Efficient Statistics for Sparse Graphical Models from Truncated Samples [19.205541380535397]
We focus on two fundamental and classical problems: (i) inference of sparse Gaussian graphical models and (ii) support recovery of sparse linear models.
For sparse linear regression, suppose samples $(bf x,y)$ are generated where $y = bf xtopOmega* + mathcalN(0,1)$ and $(bf x, y)$ is seen only if $y$ belongs to a truncation set $S subseteq mathbbRd$.
arXiv Detail & Related papers (2020-06-17T09:21:00Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.