On Learning Gaussian Multi-index Models with Gradient Flow
- URL: http://arxiv.org/abs/2310.19793v2
- Date: Thu, 2 Nov 2023 17:33:13 GMT
- Title: On Learning Gaussian Multi-index Models with Gradient Flow
- Authors: Alberto Bietti, Joan Bruna and Loucas Pillaud-Vivien
- Abstract summary: We study gradient flow on the multi-index regression problem for high-dimensional Gaussian data.
We consider a two-timescale algorithm, whereby the low-dimensional link function is learnt with a non-parametric model infinitely faster than the subspace parametrizing the low-rank projection.
- Score: 57.170617397894404
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study gradient flow on the multi-index regression problem for
high-dimensional Gaussian data. Multi-index functions consist of a composition
of an unknown low-rank linear projection and an arbitrary unknown,
low-dimensional link function. As such, they constitute a natural template for
feature learning in neural networks.
We consider a two-timescale algorithm, whereby the low-dimensional link
function is learnt with a non-parametric model infinitely faster than the
subspace parametrizing the low-rank projection. By appropriately exploiting the
matrix semigroup structure arising over the subspace correlation matrices, we
establish global convergence of the resulting Grassmannian population gradient
flow dynamics, and provide a quantitative description of its associated
`saddle-to-saddle' dynamics. Notably, the timescales associated with each
saddle can be explicitly characterized in terms of an appropriate Hermite
decomposition of the target link function. In contrast with these positive
results, we also show that the related \emph{planted} problem, where the link
function is known and fixed, in fact has a rough optimization landscape, in
which gradient flow dynamics might get trapped with high probability.
Related papers
- A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Symmetric Single Index Learning [46.7352578439663]
One popular model is the single-index model, in which labels are produced by an unknown linear projection with a possibly unknown link function.
We consider single index learning in the setting of symmetric neural networks.
arXiv Detail & Related papers (2023-10-03T14:59:00Z) - On Single Index Models beyond Gaussian Data [45.875461749455994]
Sparse high-dimensional functions have arisen as a rich framework to study the behavior of gradient-descent methods.
In this work, we explore extensions of this picture beyond the Gaussian setting where both stability or symmetry might be violated.
Our main results establish that Gradient Descent can efficiently recover the unknown direction $theta*$ in the high-dimensional regime.
arXiv Detail & Related papers (2023-07-28T20:52:22Z) - Learning Single-Index Models with Shallow Neural Networks [43.6480804626033]
We introduce a natural class of shallow neural networks and study its ability to learn single-index models via gradient flow.
We show that the corresponding optimization landscape is benign, which in turn leads to generalization guarantees that match the near-optimal sample complexity of dedicated semi-parametric methods.
arXiv Detail & Related papers (2022-10-27T17:52:58Z) - Learning Graphical Factor Models with Riemannian Optimization [70.13748170371889]
This paper proposes a flexible algorithmic framework for graph learning under low-rank structural constraints.
The problem is expressed as penalized maximum likelihood estimation of an elliptical distribution.
We leverage geometries of positive definite matrices and positive semi-definite matrices of fixed rank that are well suited to elliptical models.
arXiv Detail & Related papers (2022-10-21T13:19:45Z) - Semi-Supervised Subspace Clustering via Tensor Low-Rank Representation [64.49871502193477]
We propose a novel semi-supervised subspace clustering method, which is able to simultaneously augment the initial supervisory information and construct a discriminative affinity matrix.
Comprehensive experimental results on six commonly-used benchmark datasets demonstrate the superiority of our method over state-of-the-art methods.
arXiv Detail & Related papers (2022-05-21T01:47:17Z) - Gradient flows and randomised thresholding: sparse inversion and
classification [0.0]
Sparse inversion and classification problems are ubiquitous in modern data science and imaging.
In classification, we consider, e.g., the sum of a data fidelity term and a non-smooth Ginzburg--Landau energy.
Standard (sub)gradient descent methods have shown to be inefficient when approaching such problems.
arXiv Detail & Related papers (2022-03-22T09:21:14Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - T-LoHo: A Bayesian Regularization Model for Structured Sparsity and
Smoothness on Graphs [0.0]
In graph-structured data, structured sparsity and smoothness tend to cluster together.
We propose a new prior for high dimensional parameters with graphical relations.
We use it to detect structured sparsity and smoothness simultaneously.
arXiv Detail & Related papers (2021-07-06T10:10:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.