Related papers: Emergence of Hebbian Dynamics in Regularized Non-Local Learners

Emergence of Hebbian Dynamics in Regularized Non-Local Learners

URL: http://arxiv.org/abs/2505.18069v1
Date: Fri, 23 May 2025 16:16:09 GMT
Title: Emergence of Hebbian Dynamics in Regularized Non-Local Learners
Authors: David Koplow, Tomaso Poggio, Liu Ziyin,
Abstract summary: Gradient Descent (SGD) has emerged as a remarkably effective learning algorithm, underpinning nearly all state-of-the-art machine learning models.<n>It is widely believed that the biological brain can not implement gradient descent because it is nonlocal, and we have found little (if any) experimental evidence for it.<n>In this paper, we establish a theoretical and empirical connection between the learning signals of neural networks trained using SGD with weight decay and those trained with Hebbian learning near convergence.
Score: 1.919481257269497
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Stochastic Gradient Descent (SGD) has emerged as a remarkably effective learning algorithm, underpinning nearly all state-of-the-art machine learning models, from large language models to autonomous vehicles. Despite its practical success, SGD appears fundamentally distinct from biological learning mechanisms. It is widely believed that the biological brain can not implement gradient descent because it is nonlocal, and we have found little (if any) experimental evidence for it. In contrast, the brain is widely thought to learn via local Hebbian learning principles, which have been seen as incompatible with gradient descent. In this paper, we establish a theoretical and empirical connection between the learning signals of neural networks trained using SGD with weight decay and those trained with Hebbian learning near convergence. We show that SGD with regularization can appear to learn according to a Hebbian rule, and SGD with injected noise according to an anti-Hebbian rule. We also provide empirical evidence that Hebbian learning properties can emerge in a network with weight decay from virtually any learning rule--even random ones. These results may bridge a long-standing gap between artificial and biological learning, revealing Hebbian properties as an epiphenomenon of deeper optimization principles and cautioning against interpreting their presence in neural data as evidence against more complex hetero-synaptic mechanisms.

Related papers

Extending Spike-Timing Dependent Plasticity to Learning Synaptic Delays [50.45313162890861]
We introduce a novel learning rule for simultaneously learning synaptic connection strengths and delays.<n>We validate our approach by extending a widely-used SNN model for classification trained with unsupervised learning.<n>Results demonstrate that our proposed method consistently achieves superior performance across a variety of test scenarios.
arXiv Detail & Related papers (2025-06-17T21:24:58Z)
Concept-Guided Interpretability via Neural Chunking [54.73787666584143]
We show that neural networks exhibit patterns in their raw population activity that mirror regularities in the training data.<n>We propose three methods to extract these emerging entities, complementing each other based on label availability and dimensionality.<n>Our work points to a new direction for interpretability, one that harnesses both cognitive principles and the structure of naturalistic data.
arXiv Detail & Related papers (2025-05-16T13:49:43Z)
Emerging NeoHebbian Dynamics in Forward-Forward Learning: Implications for Neuromorphic Computing [7.345136916791223]
Forward-Forward Algorithm (FFA) employs local learning rules for each layer. We show that when employing a squared Euclidean norm as a goodness function driving the local learning, the resulting FFA is equivalent to a neo-Hebbian Learning Rule.
arXiv Detail & Related papers (2024-06-24T09:33:56Z)
SGD with Large Step Sizes Learns Sparse Features [22.959258640051342]
We showcase important features of the dynamics of the Gradient Descent (SGD) in the training of neural networks. We show that the longer large step sizes keep SGD high in the loss landscape, the better the implicit regularization can operate and find sparse representations.
arXiv Detail & Related papers (2022-10-11T11:00:04Z)
Single-phase deep learning in cortico-cortical networks [1.7249361224827535]
We introduce a new model, bursting cortico-cortical networks (BurstCCN), which integrates bursting activity, short-term plasticity and dendrite-targeting interneurons. Our results suggest that cortical features across sub-cellular, cellular, microcircuit and systems levels jointly single-phase efficient deep learning in the brain.
arXiv Detail & Related papers (2022-06-23T15:10:57Z)
Neurosymbolic hybrid approach to driver collision warning [64.02492460600905]
There are two main algorithmic approaches to autonomous driving systems. Deep learning alone has achieved state-of-the-art results in many areas. But sometimes it can be very difficult to debug if the deep learning model doesn't work.
arXiv Detail & Related papers (2022-03-28T20:29:50Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
We study the phenomenon of "double descent" of the generalization error. We find that double descent can be attributed to distinct features being learned at different scales.
arXiv Detail & Related papers (2021-12-06T18:17:08Z)
A More Biologically Plausible Local Learning Rule for ANNs [6.85316573653194]
The proposed learning rule is derived from the concepts of spike timing dependant plasticity and neuronal association. A preliminary evaluation done on the binary classification of MNIST and IRIS datasets shows comparable performance with backpropagation. The local nature of learning gives a possibility of large scale distributed and parallel learning in the network.
arXiv Detail & Related papers (2020-11-24T10:35:47Z)
Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task. This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z)
Local plasticity rules can learn deep representations using self-supervised contrastive predictions [3.6868085124383616]
Learning rules that respect biological constraints, yet yield deep hierarchical representations are still unknown. We propose a learning rule that takes inspiration from neuroscience and recent advances in self-supervised deep learning. We find that networks trained with this self-supervised and local rule build deep hierarchical representations of images, speech and video.
arXiv Detail & Related papers (2020-10-16T09:32:35Z)
The large learning rate phase of deep learning: the catapult mechanism [50.23041928811575]
We present a class of neural networks with solvable training dynamics. We find good agreement between our model's predictions and training dynamics in realistic deep learning settings. We believe our results shed light on characteristics of models trained at different learning rates.
arXiv Detail & Related papers (2020-03-04T17:52:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.