Learning a Single Index Model from Anisotropic Data with vanilla Stochastic Gradient Descent
- URL: http://arxiv.org/abs/2503.23642v1
- Date: Mon, 31 Mar 2025 01:07:30 GMT
- Title: Learning a Single Index Model from Anisotropic Data with vanilla Stochastic Gradient Descent
- Authors: Guillaume Braun, Minh Ha Quang, Masaaki Imaizumi,
- Abstract summary: We investigate the problem of learning a Single Index Model (SIM) for studying the ability of neural networks to learn features.<n>In this study, we analyze the learning dynamics of vanilla Gradient Descent (SGD) under the SIM with anisotropic input data.<n>We derive upper and lower bounds on the sample complexity using a notion of effective dimension that is determined by the structure of the covariance matrix.
- Score: 7.8378818005171125
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the problem of learning a Single Index Model (SIM)- a popular model for studying the ability of neural networks to learn features - from anisotropic Gaussian inputs by training a neuron using vanilla Stochastic Gradient Descent (SGD). While the isotropic case has been extensively studied, the anisotropic case has received less attention and the impact of the covariance matrix on the learning dynamics remains unclear. For instance, Mousavi-Hosseini et al. (2023b) proposed a spherical SGD that requires a separate estimation of the data covariance matrix, thereby oversimplifying the influence of covariance. In this study, we analyze the learning dynamics of vanilla SGD under the SIM with anisotropic input data, demonstrating that vanilla SGD automatically adapts to the data's covariance structure. Leveraging these results, we derive upper and lower bounds on the sample complexity using a notion of effective dimension that is determined by the structure of the covariance matrix instead of the input data dimension.
Related papers
- Asymptotic Analysis of Two-Layer Neural Networks after One Gradient Step under Gaussian Mixtures Data with Structure [0.8287206589886879]
We study the training and generalization performance of two-layer neural networks (NNs) after one descent step under structured data.<n>We prove that a high-order model performs equivalent to the nonlinear neural networks under certain conditions.
arXiv Detail & Related papers (2025-03-02T11:28:54Z) - Gradient-Based Feature Learning under Structured Data [57.76552698981579]
In the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction.
We show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue.
In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent.
arXiv Detail & Related papers (2023-09-07T16:55:50Z) - Capturing dynamical correlations using implicit neural representations [85.66456606776552]
We develop an artificial intelligence framework which combines a neural network trained to mimic simulated data from a model Hamiltonian with automatic differentiation to recover unknown parameters from experimental data.
In doing so, we illustrate the ability to build and train a differentiable model only once, which then can be applied in real-time to multi-dimensional scattering data.
arXiv Detail & Related papers (2023-04-08T07:55:36Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - On the Influence of Enforcing Model Identifiability on Learning dynamics
of Gaussian Mixture Models [14.759688428864159]
We propose a technique for extracting submodels from singular models.
Our method enforces model identifiability during training.
We show how the method can be applied to more complex models like deep neural networks.
arXiv Detail & Related papers (2022-06-17T07:50:22Z) - Learning from few examples with nonlinear feature maps [68.8204255655161]
We explore the phenomenon and reveal key relationships between dimensionality of AI model's feature space, non-degeneracy of data distributions, and the model's generalisation capabilities.
The main thrust of our present analysis is on the influence of nonlinear feature transformations mapping original data into higher- and possibly infinite-dimensional spaces on the resulting model's generalisation capabilities.
arXiv Detail & Related papers (2022-03-31T10:36:50Z) - Learning Invariant Weights in Neural Networks [16.127299898156203]
Many commonly used models in machine learning are constraint to respect certain symmetries in the data.
We propose a weight-space equivalent to this approach, by minimizing a lower bound on the marginal likelihood to learn invariances in neural networks.
arXiv Detail & Related papers (2022-02-25T00:17:09Z) - Mixed Effects Neural ODE: A Variational Approximation for Analyzing the
Dynamics of Panel Data [50.23363975709122]
We propose a probabilistic model called ME-NODE to incorporate (fixed + random) mixed effects for analyzing panel data.
We show that our model can be derived using smooth approximations of SDEs provided by the Wong-Zakai theorem.
We then derive Evidence Based Lower Bounds for ME-NODE, and develop (efficient) training algorithms.
arXiv Detail & Related papers (2022-02-18T22:41:51Z) - Learning Curves for SGD on Structured Features [23.40229188549055]
We show that the geometry of the data in the induced feature space is crucial to accurately predict the test error throughout learning.
We show that modeling the geometry of the data in the induced feature space is indeed crucial to accurately predict the test error throughout learning.
arXiv Detail & Related papers (2021-06-04T20:48:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.