Related papers: Stochastic Gradient Descent-Induced Drift of Representation in a Two-Layer Neural Network

Stochastic Gradient Descent-Induced Drift of Representation in a Two-Layer Neural Network

URL: http://arxiv.org/abs/2302.02563v2
Date: Tue, 6 Jun 2023 16:45:47 GMT
Title: Stochastic Gradient Descent-Induced Drift of Representation in a Two-Layer Neural Network
Authors: Farhad Pashakhanloo, Alexei Koulakov
Abstract summary: Despite being observed in the brain and in artificial networks, the mechanisms of drift and its implications are not fully understood. Motivated by recent experimental findings of stimulus-dependent drift in the piriform cortex, we use theory and simulations to study this phenomenon in a two-layer linear feedforward network.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Representational drift refers to over-time changes in neural activation accompanied by a stable task performance. Despite being observed in the brain and in artificial networks, the mechanisms of drift and its implications are not fully understood. Motivated by recent experimental findings of stimulus-dependent drift in the piriform cortex, we use theory and simulations to study this phenomenon in a two-layer linear feedforward network. Specifically, in a continual online learning scenario, we study the drift induced by the noise inherent in the Stochastic Gradient Descent (SGD). By decomposing the learning dynamics into the normal and tangent spaces of the minimum-loss manifold, we show the former corresponds to a finite variance fluctuation, while the latter could be considered as an effective diffusion process on the manifold. We analytically compute the fluctuation and the diffusion coefficients for the stimuli representations in the hidden layer as functions of network parameters and input distribution. Further, consistent with experiments, we show that the drift rate is slower for a more frequently presented stimulus. Overall, our analysis yields a theoretical framework for better understanding of the drift phenomenon in biological and artificial neural networks.

Related papers

Allostatic Control of Persistent States in Spiking Neural Networks for perception and computation [79.16635054977068]
We introduce a novel model for updating perceptual beliefs about the environment by extending the concept of Allostasis to the control of internal representations. In this paper, we focus on an application in numerical cognition, where a bump of activity in an attractor network is used as a spatial numerical representation.
arXiv Detail & Related papers (2025-03-20T12:28:08Z)
Elucidating the theoretical underpinnings of surrogate gradient learning in spiking neural networks [5.271584191900265]
Training spiking neural networks to approximate universal functions is essential for studying information processing in the brain. The binary nature of spikes poses a challenge for direct gradient-based training. Here, we investigate the relation of surrogate gradients to two theoretically well-founded approaches.
arXiv Detail & Related papers (2024-04-23T12:20:09Z)
On the dynamics of three-layer neural networks: initial condensation [2.022855152231054]
condensation occurs when gradient methods spontaneously reduce the complexity of neural networks. We establish the blow-up property of effective dynamics and present a sufficient condition for the occurrence of condensation. We also explore the association between condensation and the low-rank bias observed in deep matrix factorization.
arXiv Detail & Related papers (2024-02-25T02:36:14Z)
The twin peaks of learning neural networks [3.382017614888546]
Recent works demonstrated the existence of a double-descent phenomenon for the generalization error of neural networks. We explore a link between this phenomenon and the increase of complexity and sensitivity of the function represented by neural networks.
arXiv Detail & Related papers (2024-01-23T10:09:14Z)
Connecting NTK and NNGP: A Unified Theoretical Framework for Wide Neural Network Learning Dynamics [6.349503549199403]
We provide a comprehensive framework for the learning process of deep wide neural networks. By characterizing the diffusive phase, our work sheds light on representational drift in the brain.
arXiv Detail & Related papers (2023-09-08T18:00:01Z)
Universal Scaling Laws of Absorbing Phase Transitions in Artificial Deep Neural Networks [0.8932296777085644]
Conventional artificial deep neural networks operating near the phase boundary of the signal propagation dynamics, also known as the edge of chaos, exhibit universal scaling laws of absorbing phase transitions. We exploit the fully deterministic nature of the propagation dynamics to elucidate an analogy between a signal collapse in the neural networks and an absorbing state.
arXiv Detail & Related papers (2023-07-05T13:39:02Z)
SGD with Large Step Sizes Learns Sparse Features [22.959258640051342]
We showcase important features of the dynamics of the Gradient Descent (SGD) in the training of neural networks. We show that the longer large step sizes keep SGD high in the loss landscape, the better the implicit regularization can operate and find sparse representations.
arXiv Detail & Related papers (2022-10-11T11:00:04Z)
Momentum Diminishes the Effect of Spectral Bias in Physics-Informed Neural Networks [72.09574528342732]
Physics-informed neural network (PINN) algorithms have shown promising results in solving a wide range of problems involving partial differential equations (PDEs) They often fail to converge to desirable solutions when the target function contains high-frequency features, due to a phenomenon known as spectral bias. In the present work, we exploit neural tangent kernels (NTKs) to investigate the training dynamics of PINNs evolving under gradient descent with momentum (SGDM)
arXiv Detail & Related papers (2022-06-29T19:03:10Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
We study the phenomenon of "double descent" of the generalization error. We find that double descent can be attributed to distinct features being learned at different scales.
arXiv Detail & Related papers (2021-12-06T18:17:08Z)
A Predictive Coding Account for Chaotic Itinerancy [68.8204255655161]
We show how a recurrent neural network implementing predictive coding can generate neural trajectories similar to chaotic itinerancy in the presence of input noise. We propose two scenarios generating random and past-independent attractor switching trajectories using our model.
arXiv Detail & Related papers (2021-06-16T16:48:14Z)
Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task. This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z)
Network Diffusions via Neural Mean-Field Dynamics [52.091487866968286]
We propose a novel learning framework for inference and estimation problems of diffusion on networks. Our framework is derived from the Mori-Zwanzig formalism to obtain an exact evolution of the node infection probabilities. Our approach is versatile and robust to variations of the underlying diffusion network models.
arXiv Detail & Related papers (2020-06-16T18:45:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.