Towards Demystifying Representation Learning with Non-contrastive
Self-supervision
- URL: http://arxiv.org/abs/2110.04947v1
- Date: Mon, 11 Oct 2021 00:48:05 GMT
- Title: Towards Demystifying Representation Learning with Non-contrastive
Self-supervision
- Authors: Xiang Wang, Xinlei Chen, Simon S. Du, Yuandong Tian
- Abstract summary: Non-contrastive methods of self-supervised learning learn representations by minimizing the distance between two views of the same image.
Tian el al. (2021) made an initial attempt on the first question and proposed DirectPred that sets the predictor directly.
We show that in a simple linear network, DirectSet($alpha$) provably learns a desirable projection matrix and also reduces the sample complexity on downstream tasks.
- Score: 82.80118139087676
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Non-contrastive methods of self-supervised learning (such as BYOL and
SimSiam) learn representations by minimizing the distance between two views of
the same image. These approaches have achieved remarkable performance in
practice, but it is not well understood 1) why these methods do not collapse to
the trivial solutions and 2) how the representation is learned. Tian el al.
(2021) made an initial attempt on the first question and proposed DirectPred
that sets the predictor directly. In our work, we analyze a generalized version
of DirectPred, called DirectSet($\alpha$). We show that in a simple linear
network, DirectSet($\alpha$) provably learns a desirable projection matrix and
also reduces the sample complexity on downstream tasks. Our analysis suggests
that weight decay acts as an implicit threshold that discard the features with
high variance under augmentation, and keep the features with low variance.
Inspired by our theory, we simplify DirectPred by removing the expensive
eigen-decomposition step. On CIFAR-10, CIFAR-100, STL-10 and ImageNet,
DirectCopy, our simpler and more computationally efficient algorithm, rivals or
even outperforms DirectPred.
Related papers
- Jacobian Descent for Multi-Objective Optimization [0.6138671548064355]
We formalize Jacobian descent (JD) as a generalization of gradient descent for vector-valued functions.
In particular, the update should not conflict with any objective and should scale proportionally to the norm of each gradient.
Most notably, we introduce instance-wise risk minimization (IWRM), a learning paradigm in which the loss of each training example is considered a separate objective.
arXiv Detail & Related papers (2024-06-23T22:06:25Z) - Towards Practical Control of Singular Values of Convolutional Layers [65.25070864775793]
Convolutional neural networks (CNNs) are easy to train, but their essential properties, such as generalization error and adversarial robustness, are hard to control.
Recent research demonstrated that singular values of convolutional layers significantly affect such elusive properties.
We offer a principled approach to alleviating constraints of the prior art at the expense of an insignificant reduction in layer expressivity.
arXiv Detail & Related papers (2022-11-24T19:09:44Z) - Minimalistic Unsupervised Learning with the Sparse Manifold Transform [20.344274392350094]
We describe a minimalistic and interpretable method for unsupervised learning that achieves performance close to the SOTA SSL methods.
With a one-layer deterministic sparse manifold transform, one can achieve 99.3% KNN top-1 accuracy on MNIST.
With a simple gray-scale augmentation, the model gets 83.2% KNN top-1 accuracy on CIFAR-10 and 57% on CIFAR-100.
arXiv Detail & Related papers (2022-09-30T06:38:30Z) - Chaos is a Ladder: A New Theoretical Understanding of Contrastive
Learning via Augmentation Overlap [64.60460828425502]
We propose a new guarantee on the downstream performance of contrastive learning.
Our new theory hinges on the insight that the support of different intra-class samples will become more overlapped under aggressive data augmentations.
We propose an unsupervised model selection metric ARC that aligns well with downstream accuracy.
arXiv Detail & Related papers (2022-03-25T05:36:26Z) - Exploring the Equivalence of Siamese Self-Supervised Learning via A
Unified Gradient Framework [43.76337849044254]
Self-supervised learning has shown its great potential to extract powerful visual representations without human annotations.
Various works are proposed to deal with self-supervised learning from different perspectives.
We propose UniGrad, a simple but effective gradient form for self-supervised learning.
arXiv Detail & Related papers (2021-12-09T18:59:57Z) - MIO : Mutual Information Optimization using Self-Supervised Binary
Contrastive Learning [19.5917119072985]
We model contrastive learning into a binary classification problem to predict if a pair is positive or not.
The proposed method outperforms the state-of-the-art algorithms on benchmark datasets like STL-10, CIFAR-10, CIFAR-100.
arXiv Detail & Related papers (2021-11-24T17:51:29Z) - Dense Unsupervised Learning for Video Segmentation [49.46930315961636]
We present a novel approach to unsupervised learning for video object segmentation (VOS)
Unlike previous work, our formulation allows to learn dense feature representations directly in a fully convolutional regime.
Our approach exceeds the segmentation accuracy of previous work despite using significantly less training data and compute power.
arXiv Detail & Related papers (2021-11-11T15:15:11Z) - Understanding Dimensional Collapse in Contrastive Self-supervised
Learning [57.98014222570084]
We show that non-contrastive methods suffer from a lesser collapse problem of a different nature: dimensional collapse.
Inspired by our theory, we propose a novel contrastive learning method, called DirectCLR, which directly optimize the representation space without relying on a trainable projector.
arXiv Detail & Related papers (2021-10-18T14:22:19Z) - Understanding self-supervised Learning Dynamics without Contrastive
Pairs [72.1743263777693]
Contrastive approaches to self-supervised learning (SSL) learn representations by minimizing the distance between two augmented views of the same data point.
BYOL and SimSiam, show remarkable performance it without negative pairs.
We study the nonlinear learning dynamics of non-contrastive SSL in simple linear networks.
arXiv Detail & Related papers (2021-02-12T22:57:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.