On Single Index Models beyond Gaussian Data
- URL: http://arxiv.org/abs/2307.15804v2
- Date: Wed, 25 Oct 2023 15:57:02 GMT
- Title: On Single Index Models beyond Gaussian Data
- Authors: Joan Bruna, Loucas Pillaud-Vivien and Aaron Zweig
- Abstract summary: Sparse high-dimensional functions have arisen as a rich framework to study the behavior of gradient-descent methods.
In this work, we explore extensions of this picture beyond the Gaussian setting where both stability or symmetry might be violated.
Our main results establish that Gradient Descent can efficiently recover the unknown direction $theta*$ in the high-dimensional regime.
- Score: 45.875461749455994
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sparse high-dimensional functions have arisen as a rich framework to study
the behavior of gradient-descent methods using shallow neural networks,
showcasing their ability to perform feature learning beyond linear models.
Amongst those functions, the simplest are single-index models $f(x) = \phi( x
\cdot \theta^*)$, where the labels are generated by an arbitrary non-linear
scalar link function $\phi$ applied to an unknown one-dimensional projection
$\theta^*$ of the input data. By focusing on Gaussian data, several recent
works have built a remarkable picture, where the so-called information exponent
(related to the regularity of the link function) controls the required sample
complexity. In essence, these tools exploit the stability and spherical
symmetry of Gaussian distributions. In this work, building from the framework
of \cite{arous2020online}, we explore extensions of this picture beyond the
Gaussian setting, where both stability or symmetry might be violated. Focusing
on the planted setting where $\phi$ is known, our main results establish that
Stochastic Gradient Descent can efficiently recover the unknown direction
$\theta^*$ in the high-dimensional regime, under assumptions that extend
previous works \cite{yehudai2020learning,wu2022learning}.
Related papers
- Extended convexity and smoothness and their applications in deep learning [0.0]
In this paper, we introduce the $mathcal$H$smoothness algorithm for non-completely understood gradient and strong convexity.
The effectiveness of the proposed methodology is validated through experiments.
arXiv Detail & Related papers (2024-10-08T08:40:07Z) - Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - On Learning Gaussian Multi-index Models with Gradient Flow [57.170617397894404]
We study gradient flow on the multi-index regression problem for high-dimensional Gaussian data.
We consider a two-timescale algorithm, whereby the low-dimensional link function is learnt with a non-parametric model infinitely faster than the subspace parametrizing the low-rank projection.
arXiv Detail & Related papers (2023-10-30T17:55:28Z) - Symmetric Single Index Learning [46.7352578439663]
One popular model is the single-index model, in which labels are produced by an unknown linear projection with a possibly unknown link function.
We consider single index learning in the setting of symmetric neural networks.
arXiv Detail & Related papers (2023-10-03T14:59:00Z) - Gradient-Based Feature Learning under Structured Data [57.76552698981579]
In the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction.
We show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue.
In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent.
arXiv Detail & Related papers (2023-09-07T16:55:50Z) - Tight Nonparametric Convergence Rates for Stochastic Gradient Descent
under the Noiseless Linear Model [0.0]
We analyze the convergence of single-pass, fixed step-size gradient descent on the least-square risk under this model.
As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the noiseless observation of its value at randomly sampled points.
arXiv Detail & Related papers (2020-06-15T08:25:50Z) - A Random Matrix Analysis of Random Fourier Features: Beyond the Gaussian
Kernel, a Precise Phase Transition, and the Corresponding Double Descent [85.77233010209368]
This article characterizes the exacts of random Fourier feature (RFF) regression, in the realistic setting where the number of data samples $n$ is all large and comparable.
This analysis also provides accurate estimates of training and test regression errors for large $n,p,N$.
arXiv Detail & Related papers (2020-06-09T02:05:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.