Related papers: Information Bottleneck-Based Hebbian Learning Rule Naturally Ties Working Memory and Synaptic Updates

Information Bottleneck-Based Hebbian Learning Rule Naturally Ties Working Memory and Synaptic Updates

URL: http://arxiv.org/abs/2111.13187v1
Date: Wed, 24 Nov 2021 17:38:32 GMT
Title: Information Bottleneck-Based Hebbian Learning Rule Naturally Ties Working Memory and Synaptic Updates
Authors: Kyle Daruwalla and Mikko Lipasti
Abstract summary: We take an alternate approach that avoids back-propagation and its associated issues entirely. Recent work in deep learning proposed independently training each layer of a network via the information bottleneck (IB) We show that this modulatory signal can be learned by an auxiliary circuit with working memory like a reservoir.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Artificial neural networks have successfully tackled a large variety of problems by training extremely deep networks via back-propagation. A direct application of back-propagation to spiking neural networks contains biologically implausible components, like the weight transport problem or separate inference and learning phases. Various methods address different components individually, but a complete solution remains intangible. Here, we take an alternate approach that avoids back-propagation and its associated issues entirely. Recent work in deep learning proposed independently training each layer of a network via the information bottleneck (IB). Subsequent studies noted that this layer-wise approach circumvents error propagation across layers, leading to a biologically plausible paradigm. Unfortunately, the IB is computed using a batch of samples. The prior work addresses this with a weight update that only uses two samples (the current and previous sample). Our work takes a different approach by decomposing the weight update into a local and global component. The local component is Hebbian and only depends on the current sample. The global component computes a layer-wise modulatory signal that depends on a batch of samples. We show that this modulatory signal can be learned by an auxiliary circuit with working memory (WM) like a reservoir. Thus, we can use batch sizes greater than two, and the batch size determines the required capacity of the WM. To the best of our knowledge, our rule is the first biologically plausible mechanism to directly couple synaptic updates with a WM of the task. We evaluate our rule on synthetic datasets and image classification datasets like MNIST, and we explore the effect of the WM capacity on learning performance. We hope our work is a first-step towards understanding the mechanistic role of memory in learning.

Related papers

CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning [101.81127587760831]
Current fine-tuning methods build adapters widely of the context of downstream task to learn, or the context of important knowledge to maintain. We propose CorDA, a Context-oriented Decomposition Adaptation method that builds learnable task-aware adapters. Our method enables two options, the knowledge-preserved adaptation and the instruction-previewed adaptation.
arXiv Detail & Related papers (2024-06-07T19:10:35Z)
Transfer Learning via Test-Time Neural Networks Aggregation [11.42582922543676]
It has been demonstrated that deep neural networks outperform traditional machine learning. Deep networks lack generalisability, that is, they will not perform as good as in a new (testing) set drawn from a different distribution.
arXiv Detail & Related papers (2022-06-27T15:46:05Z)
BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning [93.38239238988719]
We propose to enable deep neural networks with the ability to learn the sample relationships from each mini-batch. BatchFormer is applied into the batch dimension of each mini-batch to implicitly explore sample relationships during training. We perform extensive experiments on over ten datasets and the proposed method achieves significant improvements on different data scarcity applications.
arXiv Detail & Related papers (2022-03-03T05:31:33Z)
Learning Bayesian Sparse Networks with Full Experience Replay for Continual Learning [54.7584721943286]
Continual Learning (CL) methods aim to enable machine learning models to learn new tasks without catastrophic forgetting of those that have been previously mastered. Existing CL approaches often keep a buffer of previously-seen samples, perform knowledge distillation, or use regularization techniques towards this goal. We propose to only activate and select sparse neurons for learning current and past tasks at any stage.
arXiv Detail & Related papers (2022-02-21T13:25:03Z)
Biologically Plausible Training Mechanisms for Self-Supervised Learning in Deep Networks [14.685237010856953]
We develop biologically plausible training mechanisms for self-supervised learning (SSL) in deep networks. We show that learning can be performed with one of two more plausible alternatives to backpagation.
arXiv Detail & Related papers (2021-09-30T12:56:57Z)
Mitigating Generation Shifts for Generalized Zero-Shot Learning [52.98182124310114]
Generalized Zero-Shot Learning (GZSL) is the task of leveraging semantic information (e.g., attributes) to recognize the seen and unseen samples, where unseen classes are not observable during training. We propose a novel Generation Shifts Mitigating Flow framework for learning unseen data synthesis efficiently and effectively. Experimental results demonstrate that GSMFlow achieves state-of-the-art recognition performance in both conventional and generalized zero-shot settings.
arXiv Detail & Related papers (2021-07-07T11:43:59Z)
Learning Intra-Batch Connections for Deep Metric Learning [3.5665681694253903]
metric learning aims to learn a function that maps samples to a lower-dimensional space where similar samples lie closer than dissimilar ones. Most approaches rely on losses that only take the relations between pairs or triplets of samples into account. We propose an approach based on message passing networks that takes into account all the relations in a mini-batch.
arXiv Detail & Related papers (2021-02-15T18:50:00Z)
Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one. Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP. We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z)
Understanding Self-supervised Learning with Dual Deep Networks [74.92916579635336]
We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks. We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator. To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
arXiv Detail & Related papers (2020-10-01T17:51:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.