Stochastic Gradient Descent-Induced Drift of Representation in a
Two-Layer Neural Network
- URL: http://arxiv.org/abs/2302.02563v2
- Date: Tue, 6 Jun 2023 16:45:47 GMT
- Title: Stochastic Gradient Descent-Induced Drift of Representation in a
Two-Layer Neural Network
- Authors: Farhad Pashakhanloo, Alexei Koulakov
- Abstract summary: Despite being observed in the brain and in artificial networks, the mechanisms of drift and its implications are not fully understood.
Motivated by recent experimental findings of stimulus-dependent drift in the piriform cortex, we use theory and simulations to study this phenomenon in a two-layer linear feedforward network.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Representational drift refers to over-time changes in neural activation
accompanied by a stable task performance. Despite being observed in the brain
and in artificial networks, the mechanisms of drift and its implications are
not fully understood. Motivated by recent experimental findings of
stimulus-dependent drift in the piriform cortex, we use theory and simulations
to study this phenomenon in a two-layer linear feedforward network.
Specifically, in a continual online learning scenario, we study the drift
induced by the noise inherent in the Stochastic Gradient Descent (SGD). By
decomposing the learning dynamics into the normal and tangent spaces of the
minimum-loss manifold, we show the former corresponds to a finite variance
fluctuation, while the latter could be considered as an effective diffusion
process on the manifold. We analytically compute the fluctuation and the
diffusion coefficients for the stimuli representations in the hidden layer as
functions of network parameters and input distribution. Further, consistent
with experiments, we show that the drift rate is slower for a more frequently
presented stimulus. Overall, our analysis yields a theoretical framework for
better understanding of the drift phenomenon in biological and artificial
neural networks.
Related papers
- On the dynamics of three-layer neural networks: initial condensation [2.022855152231054]
condensation occurs when gradient methods spontaneously reduce the complexity of neural networks.
We establish the blow-up property of effective dynamics and present a sufficient condition for the occurrence of condensation.
We also explore the association between condensation and the low-rank bias observed in deep matrix factorization.
arXiv Detail & Related papers (2024-02-25T02:36:14Z) - The twin peaks of learning neural networks [3.382017614888546]
Recent works demonstrated the existence of a double-descent phenomenon for the generalization error of neural networks.
We explore a link between this phenomenon and the increase of complexity and sensitivity of the function represented by neural networks.
arXiv Detail & Related papers (2024-01-23T10:09:14Z) - SGD with Large Step Sizes Learns Sparse Features [22.959258640051342]
We showcase important features of the dynamics of the Gradient Descent (SGD) in the training of neural networks.
We show that the longer large step sizes keep SGD high in the loss landscape, the better the implicit regularization can operate and find sparse representations.
arXiv Detail & Related papers (2022-10-11T11:00:04Z) - Momentum Diminishes the Effect of Spectral Bias in Physics-Informed
Neural Networks [72.09574528342732]
Physics-informed neural network (PINN) algorithms have shown promising results in solving a wide range of problems involving partial differential equations (PDEs)
They often fail to converge to desirable solutions when the target function contains high-frequency features, due to a phenomenon known as spectral bias.
In the present work, we exploit neural tangent kernels (NTKs) to investigate the training dynamics of PINNs evolving under gradient descent with momentum (SGDM)
arXiv Detail & Related papers (2022-06-29T19:03:10Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
We study the phenomenon of "double descent" of the generalization error.
We find that double descent can be attributed to distinct features being learned at different scales.
arXiv Detail & Related papers (2021-12-06T18:17:08Z) - The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations,
and Anomalous Diffusion [29.489737359897312]
We study the limiting dynamics of deep neural networks trained with gradient descent (SGD)
We show that the key ingredient driving these dynamics is not the original training loss, but rather the combination of a modified loss, which implicitly regularizes the velocity and probability currents, which cause oscillations in phase space.
arXiv Detail & Related papers (2021-07-19T20:18:57Z) - A Predictive Coding Account for Chaotic Itinerancy [68.8204255655161]
We show how a recurrent neural network implementing predictive coding can generate neural trajectories similar to chaotic itinerancy in the presence of input noise.
We propose two scenarios generating random and past-independent attractor switching trajectories using our model.
arXiv Detail & Related papers (2021-06-16T16:48:14Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z) - Network Diffusions via Neural Mean-Field Dynamics [52.091487866968286]
We propose a novel learning framework for inference and estimation problems of diffusion on networks.
Our framework is derived from the Mori-Zwanzig formalism to obtain an exact evolution of the node infection probabilities.
Our approach is versatile and robust to variations of the underlying diffusion network models.
arXiv Detail & Related papers (2020-06-16T18:45:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.