On Single Index Models beyond Gaussian Data
- URL: http://arxiv.org/abs/2307.15804v2
- Date: Wed, 25 Oct 2023 15:57:02 GMT
- Title: On Single Index Models beyond Gaussian Data
- Authors: Joan Bruna, Loucas Pillaud-Vivien and Aaron Zweig
- Abstract summary: Sparse high-dimensional functions have arisen as a rich framework to study the behavior of gradient-descent methods.
In this work, we explore extensions of this picture beyond the Gaussian setting where both stability or symmetry might be violated.
Our main results establish that Gradient Descent can efficiently recover the unknown direction $theta*$ in the high-dimensional regime.
- Score: 45.875461749455994
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sparse high-dimensional functions have arisen as a rich framework to study
the behavior of gradient-descent methods using shallow neural networks,
showcasing their ability to perform feature learning beyond linear models.
Amongst those functions, the simplest are single-index models $f(x) = \phi( x
\cdot \theta^*)$, where the labels are generated by an arbitrary non-linear
scalar link function $\phi$ applied to an unknown one-dimensional projection
$\theta^*$ of the input data. By focusing on Gaussian data, several recent
works have built a remarkable picture, where the so-called information exponent
(related to the regularity of the link function) controls the required sample
complexity. In essence, these tools exploit the stability and spherical
symmetry of Gaussian distributions. In this work, building from the framework
of \cite{arous2020online}, we explore extensions of this picture beyond the
Gaussian setting, where both stability or symmetry might be violated. Focusing
on the planted setting where $\phi$ is known, our main results establish that
Stochastic Gradient Descent can efficiently recover the unknown direction
$\theta^*$ in the high-dimensional regime, under assumptions that extend
previous works \cite{yehudai2020learning,wu2022learning}.
Related papers
- Learning quadratic neural networks in high dimensions: SGD dynamics and scaling laws [21.18373933718468]
We study the optimization and sample complexity of gradient-based training of a two-layer neural network with quadratic activation function in the high-dimensional regime.<n>We present a sharp analysis of the dynamics in the feature learning regime, for both the population limit and the finite-sample discretization.
arXiv Detail & Related papers (2025-08-05T17:57:56Z) - Learning single-index models via harmonic decomposition [22.919597674245612]
We study the problem of learning single-index models, where the labely in mathbbR$ depends on the input $boldsymbolx in bbRd$ only through an unknown one-dimensional projection.<n>We introduce two families of estimators -- based on unfolding and online SGD -- that respectively achieve either optimal complexity or optimal runtime.
arXiv Detail & Related papers (2025-06-11T15:59:53Z) - Joint Learning in the Gaussian Single Index Model [6.3151583550712065]
We consider the problem of jointly learning a one-dimensional projection and a unidimensional function in high-dimensional Gaussian models.<n>Our analysis shows that convergence still occurs even when the initial direction is negatively correlated with the target.<n>On the practical side, we demonstrate that such joint learning can be effectively implemented using a Reproducing Hilbert Kernel Space adapted to the structure of the problem.
arXiv Detail & Related papers (2025-05-27T15:30:34Z) - Extended convexity and smoothness and their applications in deep learning [0.0]
In this paper, we introduce the $mathcal$H$smoothness algorithm for non-completely understood gradient and strong convexity.
The effectiveness of the proposed methodology is validated through experiments.
arXiv Detail & Related papers (2024-10-08T08:40:07Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - On Learning Gaussian Multi-index Models with Gradient Flow [57.170617397894404]
We study gradient flow on the multi-index regression problem for high-dimensional Gaussian data.
We consider a two-timescale algorithm, whereby the low-dimensional link function is learnt with a non-parametric model infinitely faster than the subspace parametrizing the low-rank projection.
arXiv Detail & Related papers (2023-10-30T17:55:28Z) - Symmetric Single Index Learning [46.7352578439663]
One popular model is the single-index model, in which labels are produced by an unknown linear projection with a possibly unknown link function.
We consider single index learning in the setting of symmetric neural networks.
arXiv Detail & Related papers (2023-10-03T14:59:00Z) - Gradient-Based Feature Learning under Structured Data [57.76552698981579]
In the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction.
We show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue.
In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent.
arXiv Detail & Related papers (2023-09-07T16:55:50Z) - Tight Nonparametric Convergence Rates for Stochastic Gradient Descent
under the Noiseless Linear Model [0.0]
We analyze the convergence of single-pass, fixed step-size gradient descent on the least-square risk under this model.
As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the noiseless observation of its value at randomly sampled points.
arXiv Detail & Related papers (2020-06-15T08:25:50Z) - A Random Matrix Analysis of Random Fourier Features: Beyond the Gaussian
Kernel, a Precise Phase Transition, and the Corresponding Double Descent [85.77233010209368]
This article characterizes the exacts of random Fourier feature (RFF) regression, in the realistic setting where the number of data samples $n$ is all large and comparable.
This analysis also provides accurate estimates of training and test regression errors for large $n,p,N$.
arXiv Detail & Related papers (2020-06-09T02:05:40Z) - Semiparametric Nonlinear Bipartite Graph Representation Learning with
Provable Guarantees [106.91654068632882]
We consider the bipartite graph and formalize its representation learning problem as a statistical estimation problem of parameters in a semiparametric exponential family distribution.
We show that the proposed objective is strongly convex in a neighborhood around the ground truth, so that a gradient descent-based method achieves linear convergence rate.
Our estimator is robust to any model misspecification within the exponential family, which is validated in extensive experiments.
arXiv Detail & Related papers (2020-03-02T16:40:36Z) - A Precise High-Dimensional Asymptotic Theory for Boosting and
Minimum-$\ell_1$-Norm Interpolated Classifiers [3.167685495996986]
This paper establishes a precise high-dimensional theory for boosting on separable data.
Under a class of statistical models, we provide an exact analysis of the universality error of boosting.
We also explicitly pin down the relation between the boosting test error and the optimal Bayes error.
arXiv Detail & Related papers (2020-02-05T00:24:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.