LoCo: Local Contrastive Representation Learning
- URL: http://arxiv.org/abs/2008.01342v2
- Date: Sun, 29 Nov 2020 08:54:06 GMT
- Title: LoCo: Local Contrastive Representation Learning
- Authors: Yuwen Xiong, Mengye Ren, Raquel Urtasun
- Abstract summary: We show that by overlapping local blocks stacking on top of each other, we effectively increase the decoder depth and allow upper blocks to implicitly send feedbacks to lower blocks.
This simple design closes the performance gap between local learning and end-to-end contrastive learning algorithms for the first time.
- Score: 93.98029899866866
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural nets typically perform end-to-end backpropagation to learn the
weights, a procedure that creates synchronization constraints in the weight
update step across layers and is not biologically plausible. Recent advances in
unsupervised contrastive representation learning point to the question of
whether a learning algorithm can also be made local, that is, the updates of
lower layers do not directly depend on the computation of upper layers. While
Greedy InfoMax separately learns each block with a local objective, we found
that it consistently hurts readout accuracy in state-of-the-art unsupervised
contrastive learning algorithms, possibly due to the greedy objective as well
as gradient isolation. In this work, we discover that by overlapping local
blocks stacking on top of each other, we effectively increase the decoder depth
and allow upper blocks to implicitly send feedbacks to lower blocks. This
simple design closes the performance gap between local learning and end-to-end
contrastive learning algorithms for the first time. Aside from standard
ImageNet experiments, we also show results on complex downstream tasks such as
object detection and instance segmentation directly using readout features.
Related papers
- Towards Interpretable Deep Local Learning with Successive Gradient Reconciliation [70.43845294145714]
Relieving the reliance of neural network training on a global back-propagation (BP) has emerged as a notable research topic.
We propose a local training strategy that successively regularizes the gradient reconciliation between neighboring modules.
Our method can be integrated into both local-BP and BP-free settings.
arXiv Detail & Related papers (2024-06-07T19:10:31Z) - Discrete Neural Algorithmic Reasoning [18.497863598167257]
We propose to force neural reasoners to maintain the execution trajectory as a combination of finite predefined states.
trained with supervision on the algorithm's state transitions, such models are able to perfectly align with the original algorithm.
arXiv Detail & Related papers (2024-02-18T16:03:04Z) - Unlocking Deep Learning: A BP-Free Approach for Parallel Block-Wise
Training of Neural Networks [9.718519843862937]
We introduce a block-wise BP-free (BWBPF) neural network that leverages local error signals to optimize sub-neural networks separately.
Our experimental results consistently show that this approach can identify transferable decoupled architectures for VGG and ResNet variations.
arXiv Detail & Related papers (2023-12-20T08:02:33Z) - The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - Stacked unsupervised learning with a network architecture found by
supervised meta-learning [4.209801809583906]
Stacked unsupervised learning seems more biologically plausible than backpropagation.
But SUL has fallen far short of backpropagation in practical applications.
We show an SUL algorithm that can perform completely unsupervised clustering of MNIST digits.
arXiv Detail & Related papers (2022-06-06T16:17:20Z) - Auto-tuning of Deep Neural Networks by Conflicting Layer Removal [0.0]
We introduce a novel methodology to identify layers that decrease the test accuracy of trained models.
Conflicting layers are detected as early as the beginning of training.
We will show that around 60% of the layers of trained residual networks can be completely removed from the architecture.
arXiv Detail & Related papers (2021-03-07T11:51:55Z) - Towards Efficient Scene Understanding via Squeeze Reasoning [71.1139549949694]
We propose a novel framework called Squeeze Reasoning.
Instead of propagating information on the spatial map, we first learn to squeeze the input feature into a channel-wise global vector.
We show that our approach can be modularized as an end-to-end trained block and can be easily plugged into existing networks.
arXiv Detail & Related papers (2020-11-06T12:17:01Z) - Why Layer-Wise Learning is Hard to Scale-up and a Possible Solution via
Accelerated Downsampling [19.025707054206457]
Layer-wise learning can achieve state-of-the-art performance in image classification on various datasets.
Previous studies of layer-wise learning are limited to networks with simple hierarchical structures.
This paper reveals the fundamental reason that impedes the scale-up of layer-wise learning is due to the relatively poor separability of the feature space in shallow layers.
arXiv Detail & Related papers (2020-10-15T21:51:43Z) - Depthwise Non-local Module for Fast Salient Object Detection Using a
Single Thread [136.2224792151324]
We propose a new deep learning algorithm for fast salient object detection.
The proposed algorithm achieves competitive accuracy and high inference efficiency simultaneously with a single CPU thread.
arXiv Detail & Related papers (2020-01-22T15:23:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.