Related papers: Understanding self-supervised Learning Dynamics without Contrastive Pairs

Understanding self-supervised Learning Dynamics without Contrastive Pairs

URL: http://arxiv.org/abs/2102.06810v1
Date: Fri, 12 Feb 2021 22:57:28 GMT
Title: Understanding self-supervised Learning Dynamics without Contrastive Pairs
Authors: Yuandong Tian and Xinlei Chen and Surya Ganguli
Abstract summary: Contrastive approaches to self-supervised learning (SSL) learn representations by minimizing the distance between two augmented views of the same data point. BYOL and SimSiam, show remarkable performance it without negative pairs. We study the nonlinear learning dynamics of non-contrastive SSL in simple linear networks.
Score: 72.1743263777693
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Contrastive approaches to self-supervised learning (SSL) learn representations by minimizing the distance between two augmented views of the same data point (positive pairs) and maximizing the same from different data points (negative pairs). However, recent approaches like BYOL and SimSiam, show remarkable performance {\it without} negative pairs, raising a fundamental theoretical question: how can SSL with only positive pairs avoid representational collapse? We study the nonlinear learning dynamics of non-contrastive SSL in simple linear networks. Our analysis yields conceptual insights into how non-contrastive SSL methods learn, how they avoid representational collapse, and how multiple factors, like predictor networks, stop-gradients, exponential moving averages, and weight decay all come into play. Our simple theory recapitulates the results of real-world ablation studies in both STL-10 and ImageNet. Furthermore, motivated by our theory we propose a novel approach that \emph{directly} sets the predictor based on the statistics of its inputs. In the case of linear predictors, our approach outperforms gradient training of the predictor by $5\%$ and on ImageNet it performs comparably with more complex two-layer non-linear predictors that employ BatchNorm. Code is released in https://github.com/facebookresearch/luckmatters/tree/master/ssl.

Related papers

Implicit Contrastive Representation Learning with Guided Stop-gradient [0.0]
We introduce a methodology to implicitly incorporate the idea of contrastive learning. We show that our method stabilizes training and boosts performance. The algorithms with our method work well with small batch sizes and do not collapse even when there is no predictor.
arXiv Detail & Related papers (2025-03-12T04:46:53Z)
Understanding Representation Learnability of Nonlinear Self-Supervised Learning [13.965135660149212]
Self-supervised learning (SSL) has empirically shown its data representation learnability in many downstream tasks. Our paper is the first to analyze the learning results of the nonlinear SSL model accurately.
arXiv Detail & Related papers (2024-01-06T13:23:26Z)
Non-contrastive representation learning for intervals from well logs [58.70164460091879]
The representation learning problem in the oil & gas industry aims to construct a model that provides a representation based on logging data for a well interval. One of the possible approaches is self-supervised learning (SSL) We are the first to introduce non-contrastive SSL for well-logging data.
arXiv Detail & Related papers (2022-09-28T13:27:10Z)
Siamese Prototypical Contrastive Learning [24.794022951873156]
Contrastive Self-supervised Learning (CSL) is a practical solution that learns meaningful visual representations from massive data in an unsupervised approach. In this paper, we tackle this problem by introducing a simple but effective contrastive learning framework. The key insight is to employ siamese-style metric loss to match intra-prototype features, while increasing the distance between inter-prototype features.
arXiv Detail & Related papers (2022-08-18T13:25:30Z)
Chaos is a Ladder: A New Theoretical Understanding of Contrastive Learning via Augmentation Overlap [64.60460828425502]
We propose a new guarantee on the downstream performance of contrastive learning. Our new theory hinges on the insight that the support of different intra-class samples will become more overlapped under aggressive data augmentations. We propose an unsupervised model selection metric ARC that aligns well with downstream accuracy.
arXiv Detail & Related papers (2022-03-25T05:36:26Z)
Self-supervised Learning is More Robust to Dataset Imbalance [65.84339596595383]
We investigate self-supervised learning under dataset imbalance. Off-the-shelf self-supervised representations are already more robust to class imbalance than supervised representations. We devise a re-weighted regularization technique that consistently improves the SSL representation quality on imbalanced datasets.
arXiv Detail & Related papers (2021-10-11T06:29:56Z)
Understanding Self-supervised Learning with Dual Deep Networks [74.92916579635336]
We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks. We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator. To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
arXiv Detail & Related papers (2020-10-01T17:51:49Z)
Whitening for Self-Supervised Representation Learning [129.57407186848917]
We propose a new loss function for self-supervised representation learning (SSL) based on the whitening of latent-space features. Our solution does not require asymmetric networks and it is conceptually simple.
arXiv Detail & Related papers (2020-07-13T12:33:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.