Symmetric Single Index Learning
- URL: http://arxiv.org/abs/2310.02117v1
- Date: Tue, 3 Oct 2023 14:59:00 GMT
- Title: Symmetric Single Index Learning
- Authors: Aaron Zweig, Joan Bruna
- Abstract summary: One popular model is the single-index model, in which labels are produced by an unknown linear projection with a possibly unknown link function.
We consider single index learning in the setting of symmetric neural networks.
- Score: 46.7352578439663
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Few neural architectures lend themselves to provable learning with gradient
based methods. One popular model is the single-index model, in which labels are
produced by composing an unknown linear projection with a possibly unknown
scalar link function. Learning this model with SGD is relatively
well-understood, whereby the so-called information exponent of the link
function governs a polynomial sample complexity rate. However, extending this
analysis to deeper or more complicated architectures remains challenging.
In this work, we consider single index learning in the setting of symmetric
neural networks. Under analytic assumptions on the activation and maximum
degree assumptions on the link function, we prove that gradient flow recovers
the hidden planted direction, represented as a finitely supported vector in the
feature space of power sum polynomials. We characterize a notion of information
exponent adapted to our setting that controls the efficiency of learning.
Related papers
- Repetita Iuvant: Data Repetition Allows SGD to Learn High-Dimensional Multi-Index Functions [20.036783417617652]
We investigate the training dynamics of two-layer shallow neural networks trained with gradient-based algorithms.
We show that a simple modification of the idealized single-pass gradient descent training scenario drastically improves its computational efficiency.
Our results highlight the ability of networks to learn relevant structures from data alone without any pre-processing.
arXiv Detail & Related papers (2024-05-24T11:34:31Z) - GINN-LP: A Growing Interpretable Neural Network for Discovering
Multivariate Laurent Polynomial Equations [1.1142444517901018]
We propose GINN-LP, an interpretable neural network, to discover the form of a Laurent Polynomial equation.
To the best of our knowledge, this is the first neural network that can discover arbitrary terms without any prior information on the order.
We show that GINN-LP outperforms the state-of-theart symbolic regression methods on datasets.
arXiv Detail & Related papers (2023-12-18T03:44:29Z) - On Learning Gaussian Multi-index Models with Gradient Flow [57.170617397894404]
We study gradient flow on the multi-index regression problem for high-dimensional Gaussian data.
We consider a two-timescale algorithm, whereby the low-dimensional link function is learnt with a non-parametric model infinitely faster than the subspace parametrizing the low-rank projection.
arXiv Detail & Related papers (2023-10-30T17:55:28Z) - Gradient-Based Feature Learning under Structured Data [57.76552698981579]
In the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction.
We show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue.
In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent.
arXiv Detail & Related papers (2023-09-07T16:55:50Z) - On Single Index Models beyond Gaussian Data [45.875461749455994]
Sparse high-dimensional functions have arisen as a rich framework to study the behavior of gradient-descent methods.
In this work, we explore extensions of this picture beyond the Gaussian setting where both stability or symmetry might be violated.
Our main results establish that Gradient Descent can efficiently recover the unknown direction $theta*$ in the high-dimensional regime.
arXiv Detail & Related papers (2023-07-28T20:52:22Z) - Learning Single-Index Models with Shallow Neural Networks [43.6480804626033]
We introduce a natural class of shallow neural networks and study its ability to learn single-index models via gradient flow.
We show that the corresponding optimization landscape is benign, which in turn leads to generalization guarantees that match the near-optimal sample complexity of dedicated semi-parametric methods.
arXiv Detail & Related papers (2022-10-27T17:52:58Z) - Neural Eigenfunctions Are Structured Representation Learners [93.53445940137618]
This paper introduces a structured, adaptive-length deep representation called Neural Eigenmap.
We show that, when the eigenfunction is derived from positive relations in a data augmentation setup, applying NeuralEF results in an objective function.
We demonstrate using such representations as adaptive-length codes in image retrieval systems.
arXiv Detail & Related papers (2022-10-23T07:17:55Z) - Inducing Gaussian Process Networks [80.40892394020797]
We propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points.
The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains.
We report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods.
arXiv Detail & Related papers (2022-04-21T05:27:09Z) - FLAMBE: Structural Complexity and Representation Learning of Low Rank
MDPs [53.710405006523274]
This work focuses on the representation learning question: how can we learn such features?
Under the assumption that the underlying (unknown) dynamics correspond to a low rank transition matrix, we show how the representation learning question is related to a particular non-linear matrix decomposition problem.
We develop FLAMBE, which engages in exploration and representation learning for provably efficient RL in low rank transition models.
arXiv Detail & Related papers (2020-06-18T19:11:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.