Understanding Self-supervised Learning with Dual Deep Networks
- URL: http://arxiv.org/abs/2010.00578v6
- Date: Mon, 15 Feb 2021 04:51:42 GMT
- Title: Understanding Self-supervised Learning with Dual Deep Networks
- Authors: Yuandong Tian and Lantao Yu and Xinlei Chen and Surya Ganguli
- Abstract summary: We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks.
We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator.
To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
- Score: 74.92916579635336
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel theoretical framework to understand contrastive
self-supervised learning (SSL) methods that employ dual pairs of deep ReLU
networks (e.g., SimCLR). First, we prove that in each SGD update of SimCLR with
various loss functions, including simple contrastive loss, soft Triplet loss
and InfoNCE loss, the weights at each layer are updated by a \emph{covariance
operator} that specifically amplifies initial random selectivities that vary
across data samples but survive averages over data augmentations. To further
study what role the covariance operator plays and which features are learned in
such a process, we model data generation and augmentation processes through a
\emph{hierarchical latent tree model} (HLTM) and prove that the hidden neurons
of deep ReLU networks can learn the latent variables in HLTM, despite the fact
that the network receives \emph{no direct supervision} from these unobserved
latent variables. This leads to a provable emergence of hierarchical features
through the amplification of initially random selectivities through contrastive
SSL. Extensive numerical studies justify our theoretical findings. Code is
released in https://github.com/facebookresearch/luckmatters/tree/master/ssl.
Related papers
- Assessing Neural Network Representations During Training Using
Noise-Resilient Diffusion Spectral Entropy [55.014926694758195]
Entropy and mutual information in neural networks provide rich information on the learning process.
We leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures.
We show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data.
arXiv Detail & Related papers (2023-12-04T01:32:42Z) - Improved Convergence Guarantees for Shallow Neural Networks [91.3755431537592]
We prove convergence of depth 2 neural networks, trained via gradient descent, to a global minimum.
Our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances, adversarial labels.
They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the NTK regime''
arXiv Detail & Related papers (2022-12-05T14:47:52Z) - Discriminability-enforcing loss to improve representation learning [20.4701676109641]
We introduce a new loss term inspired by the Gini impurity to minimize the entropy of individual high-level features.
Although our Gini loss induces highly-discriminative features, it does not ensure that the distribution of the high-level features matches the distribution of the classes.
Our empirical results show that integrating our novel loss terms into the training objective consistently outperforms the models trained with cross-entropy alone.
arXiv Detail & Related papers (2022-02-14T22:31:37Z) - Information Bottleneck-Based Hebbian Learning Rule Naturally Ties
Working Memory and Synaptic Updates [0.0]
We take an alternate approach that avoids back-propagation and its associated issues entirely.
Recent work in deep learning proposed independently training each layer of a network via the information bottleneck (IB)
We show that this modulatory signal can be learned by an auxiliary circuit with working memory like a reservoir.
arXiv Detail & Related papers (2021-11-24T17:38:32Z) - Biologically Plausible Training Mechanisms for Self-Supervised Learning
in Deep Networks [14.685237010856953]
We develop biologically plausible training mechanisms for self-supervised learning (SSL) in deep networks.
We show that learning can be performed with one of two more plausible alternatives to backpagation.
arXiv Detail & Related papers (2021-09-30T12:56:57Z) - Mitigating Generation Shifts for Generalized Zero-Shot Learning [52.98182124310114]
Generalized Zero-Shot Learning (GZSL) is the task of leveraging semantic information (e.g., attributes) to recognize the seen and unseen samples, where unseen classes are not observable during training.
We propose a novel Generation Shifts Mitigating Flow framework for learning unseen data synthesis efficiently and effectively.
Experimental results demonstrate that GSMFlow achieves state-of-the-art recognition performance in both conventional and generalized zero-shot settings.
arXiv Detail & Related papers (2021-07-07T11:43:59Z) - PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive
Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context.
We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z) - Dual-constrained Deep Semi-Supervised Coupled Factorization Network with
Enriched Prior [80.5637175255349]
We propose a new enriched prior based Dual-constrained Deep Semi-Supervised Coupled Factorization Network, called DS2CF-Net.
To ex-tract hidden deep features, DS2CF-Net is modeled as a deep-structure and geometrical structure-constrained neural network.
Our network can obtain state-of-the-art performance for representation learning and clustering.
arXiv Detail & Related papers (2020-09-08T13:10:21Z) - Enabling Continual Learning with Differentiable Hebbian Plasticity [18.12749708143404]
Continual learning is the problem of sequentially learning new tasks or knowledge while protecting previously acquired knowledge.
catastrophic forgetting poses a grand challenge for neural networks performing such learning process.
We propose a Differentiable Hebbian Consolidation model which is composed of a Differentiable Hebbian Plasticity.
arXiv Detail & Related papers (2020-06-30T06:42:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.