Biologically Plausible Training Mechanisms for Self-Supervised Learning
in Deep Networks
- URL: http://arxiv.org/abs/2109.15089v2
- Date: Fri, 1 Oct 2021 21:40:28 GMT
- Title: Biologically Plausible Training Mechanisms for Self-Supervised Learning
in Deep Networks
- Authors: Mufeng Tang, Yibo Yang, Yali Amit
- Abstract summary: We develop biologically plausible training mechanisms for self-supervised learning (SSL) in deep networks.
We show that learning can be performed with one of two more plausible alternatives to backpagation.
- Score: 14.685237010856953
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop biologically plausible training mechanisms for self-supervised
learning (SSL) in deep networks. SSL, with a contrastive loss, is more natural
as it does not require labelled data and its robustness to perturbations yields
more adaptable embeddings. Moreover the perturbation of data required to create
positive pairs for SSL is easily produced in a natural environment by observing
objects in motion and with variable lighting over time. We propose a
contrastive hinge based loss whose error involves simple local computations as
opposed to the standard contrastive losses employed in the literature, which do
not lend themselves easily to implementation in a network architecture due to
complex computations involving ratios and inner products. Furthermore we show
that learning can be performed with one of two more plausible alternatives to
backpropagation. The first is difference target propagation (DTP), which trains
network parameters using target-based local losses and employs a Hebbian
learning rule, thus overcoming the biologically implausible symmetric weight
problem in backpropagation. The second is simply layer-wise learning, where
each layer is directly connected to a layer computing the loss error. The
layers are either updated sequentially in a greedy fashion (GLL) or in random
order (RLL), and each training stage involves a single hidden layer network.
The one step backpropagation needed for each such network can either be altered
with fixed random feedback weights as proposed in Lillicrap et al. (2016), or
using updated random feedback as in Amit (2019). Both methods represent
alternatives to the symmetric weight issue of backpropagation. By training
convolutional neural networks (CNNs) with SSL and DTP, GLL or RLL, we find that
our proposed framework achieves comparable performance to its implausible
counterparts in both linear evaluation and transfer learning tasks.
Related papers
- Desire Backpropagation: A Lightweight Training Algorithm for Multi-Layer
Spiking Neural Networks based on Spike-Timing-Dependent Plasticity [13.384228628766236]
Spiking neural networks (SNNs) are a viable alternative to conventional artificial neural networks.
We present desire backpropagation, a method to derive the desired spike activity of all neurons, including the hidden ones.
We trained three-layer networks to classify MNIST and Fashion-MNIST images and reached an accuracy of 98.41% and 87.56%, respectively.
arXiv Detail & Related papers (2022-11-10T08:32:13Z) - Slimmable Networks for Contrastive Self-supervised Learning [69.9454691873866]
Self-supervised learning makes significant progress in pre-training large models, but struggles with small models.
We introduce another one-stage solution to obtain pre-trained small models without the need for extra teachers.
A slimmable network consists of a full network and several weight-sharing sub-networks, which can be pre-trained once to obtain various networks.
arXiv Detail & Related papers (2022-09-30T15:15:05Z) - Learning from Data with Noisy Labels Using Temporal Self-Ensemble [11.245833546360386]
Deep neural networks (DNNs) have an enormous capacity to memorize noisy labels.
Current state-of-the-art methods present a co-training scheme that trains dual networks using samples associated with small losses.
We propose a simple yet effective robust training scheme that operates by training only a single network.
arXiv Detail & Related papers (2022-07-21T08:16:31Z) - Transfer Learning via Test-Time Neural Networks Aggregation [11.42582922543676]
It has been demonstrated that deep neural networks outperform traditional machine learning.
Deep networks lack generalisability, that is, they will not perform as good as in a new (testing) set drawn from a different distribution.
arXiv Detail & Related papers (2022-06-27T15:46:05Z) - Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z) - Information Bottleneck-Based Hebbian Learning Rule Naturally Ties
Working Memory and Synaptic Updates [0.0]
We take an alternate approach that avoids back-propagation and its associated issues entirely.
Recent work in deep learning proposed independently training each layer of a network via the information bottleneck (IB)
We show that this modulatory signal can be learned by an auxiliary circuit with working memory like a reservoir.
arXiv Detail & Related papers (2021-11-24T17:38:32Z) - Towards Accurate Knowledge Transfer via Target-awareness Representation
Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED)
TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model.
Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z) - Understanding Self-supervised Learning with Dual Deep Networks [74.92916579635336]
We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks.
We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator.
To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
arXiv Detail & Related papers (2020-10-01T17:51:49Z) - Feature Purification: How Adversarial Training Performs Robust Deep
Learning [66.05472746340142]
We show a principle that we call Feature Purification, where we show one of the causes of the existence of adversarial examples is the accumulation of certain small dense mixtures in the hidden weights during the training process of a neural network.
We present both experiments on the CIFAR-10 dataset to illustrate this principle, and a theoretical result proving that for certain natural classification tasks, training a two-layer neural network with ReLU activation using randomly gradient descent indeed this principle.
arXiv Detail & Related papers (2020-05-20T16:56:08Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.