Towards Demystifying Representation Learning with Non-contrastive
Self-supervision
- URL: http://arxiv.org/abs/2110.04947v1
- Date: Mon, 11 Oct 2021 00:48:05 GMT
- Title: Towards Demystifying Representation Learning with Non-contrastive
Self-supervision
- Authors: Xiang Wang, Xinlei Chen, Simon S. Du, Yuandong Tian
- Abstract summary: Non-contrastive methods of self-supervised learning learn representations by minimizing the distance between two views of the same image.
Tian el al. (2021) made an initial attempt on the first question and proposed DirectPred that sets the predictor directly.
We show that in a simple linear network, DirectSet($alpha$) provably learns a desirable projection matrix and also reduces the sample complexity on downstream tasks.
- Score: 82.80118139087676
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Non-contrastive methods of self-supervised learning (such as BYOL and
SimSiam) learn representations by minimizing the distance between two views of
the same image. These approaches have achieved remarkable performance in
practice, but it is not well understood 1) why these methods do not collapse to
the trivial solutions and 2) how the representation is learned. Tian el al.
(2021) made an initial attempt on the first question and proposed DirectPred
that sets the predictor directly. In our work, we analyze a generalized version
of DirectPred, called DirectSet($\alpha$). We show that in a simple linear
network, DirectSet($\alpha$) provably learns a desirable projection matrix and
also reduces the sample complexity on downstream tasks. Our analysis suggests
that weight decay acts as an implicit threshold that discard the features with
high variance under augmentation, and keep the features with low variance.
Inspired by our theory, we simplify DirectPred by removing the expensive
eigen-decomposition step. On CIFAR-10, CIFAR-100, STL-10 and ImageNet,
DirectCopy, our simpler and more computationally efficient algorithm, rivals or
even outperforms DirectPred.
Related papers
- Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think [72.48325960659822]
One main bottleneck in training large-scale diffusion models for generation lies in effectively learning these representations.
We study this by introducing a straightforward regularization called REPresentation Alignment (REPA), which aligns the projections of noisy input hidden states in denoising networks with clean image representations obtained from external, pretrained visual encoders.
The results are striking: our simple strategy yields significant improvements in both training efficiency and generation quality when applied to popular diffusion and flow-based transformers, such as DiTs and SiTs.
arXiv Detail & Related papers (2024-10-09T14:34:53Z) - Towards Practical Control of Singular Values of Convolutional Layers [65.25070864775793]
Convolutional neural networks (CNNs) are easy to train, but their essential properties, such as generalization error and adversarial robustness, are hard to control.
Recent research demonstrated that singular values of convolutional layers significantly affect such elusive properties.
We offer a principled approach to alleviating constraints of the prior art at the expense of an insignificant reduction in layer expressivity.
arXiv Detail & Related papers (2022-11-24T19:09:44Z) - spred: Solving $L_1$ Penalty with SGD [6.2255027793924285]
We propose to minimize a differentiable objective with $L_$ using a simple reparametrization.
We prove that the reparametrization trick is completely benign" with an exactiable non function.
arXiv Detail & Related papers (2022-10-03T20:07:51Z) - Minimalistic Unsupervised Learning with the Sparse Manifold Transform [20.344274392350094]
We describe a minimalistic and interpretable method for unsupervised learning that achieves performance close to the SOTA SSL methods.
With a one-layer deterministic sparse manifold transform, one can achieve 99.3% KNN top-1 accuracy on MNIST.
With a simple gray-scale augmentation, the model gets 83.2% KNN top-1 accuracy on CIFAR-10 and 57% on CIFAR-100.
arXiv Detail & Related papers (2022-09-30T06:38:30Z) - Chaos is a Ladder: A New Theoretical Understanding of Contrastive
Learning via Augmentation Overlap [64.60460828425502]
We propose a new guarantee on the downstream performance of contrastive learning.
Our new theory hinges on the insight that the support of different intra-class samples will become more overlapped under aggressive data augmentations.
We propose an unsupervised model selection metric ARC that aligns well with downstream accuracy.
arXiv Detail & Related papers (2022-03-25T05:36:26Z) - Exploring the Equivalence of Siamese Self-Supervised Learning via A
Unified Gradient Framework [43.76337849044254]
Self-supervised learning has shown its great potential to extract powerful visual representations without human annotations.
Various works are proposed to deal with self-supervised learning from different perspectives.
We propose UniGrad, a simple but effective gradient form for self-supervised learning.
arXiv Detail & Related papers (2021-12-09T18:59:57Z) - MIO : Mutual Information Optimization using Self-Supervised Binary
Contrastive Learning [19.5917119072985]
We model contrastive learning into a binary classification problem to predict if a pair is positive or not.
The proposed method outperforms the state-of-the-art algorithms on benchmark datasets like STL-10, CIFAR-10, CIFAR-100.
arXiv Detail & Related papers (2021-11-24T17:51:29Z) - Dense Unsupervised Learning for Video Segmentation [49.46930315961636]
We present a novel approach to unsupervised learning for video object segmentation (VOS)
Unlike previous work, our formulation allows to learn dense feature representations directly in a fully convolutional regime.
Our approach exceeds the segmentation accuracy of previous work despite using significantly less training data and compute power.
arXiv Detail & Related papers (2021-11-11T15:15:11Z) - Understanding self-supervised Learning Dynamics without Contrastive
Pairs [72.1743263777693]
Contrastive approaches to self-supervised learning (SSL) learn representations by minimizing the distance between two augmented views of the same data point.
BYOL and SimSiam, show remarkable performance it without negative pairs.
We study the nonlinear learning dynamics of non-contrastive SSL in simple linear networks.
arXiv Detail & Related papers (2021-02-12T22:57:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.