How Does SimSiam Avoid Collapse Without Negative Samples? A Unified
Understanding with Self-supervised Contrastive Learning
- URL: http://arxiv.org/abs/2203.16262v1
- Date: Wed, 30 Mar 2022 12:46:31 GMT
- Title: How Does SimSiam Avoid Collapse Without Negative Samples? A Unified
Understanding with Self-supervised Contrastive Learning
- Authors: Chaoning Zhang, Kang Zhang, Chenshuang Zhang, Trung X. Pham, Chang D.
Yoo, In So Kweon
- Abstract summary: To avoid collapse in self-supervised learning, a contrastive loss is widely used but often requires a large number of negative samples.
A recent work has attracted significant attention for providing a minimalist simple Siamese (SimSiam) method to avoid collapse.
- Score: 79.94590011183446
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To avoid collapse in self-supervised learning (SSL), a contrastive loss is
widely used but often requires a large number of negative samples. Without
negative samples yet achieving competitive performance, a recent work has
attracted significant attention for providing a minimalist simple Siamese
(SimSiam) method to avoid collapse. However, the reason for how it avoids
collapse without negative samples remains not fully clear and our investigation
starts by revisiting the explanatory claims in the original SimSiam. After
refuting their claims, we introduce vector decomposition for analyzing the
collapse based on the gradient analysis of the $l_2$-normalized representation
vector. This yields a unified perspective on how negative samples and SimSiam
alleviate collapse. Such a unified perspective comes timely for understanding
the recent progress in SSL.
Related papers
- SimANS: Simple Ambiguous Negatives Sampling for Dense Text Retrieval [126.22182758461244]
We show that according to the measured relevance scores, the negatives ranked around the positives are generally more informative and less likely to be false negatives.
We propose a simple ambiguous negatives sampling method, SimANS, which incorporates a new sampling probability distribution to sample more ambiguous negatives.
arXiv Detail & Related papers (2022-10-21T07:18:05Z) - What shapes the loss landscape of self-supervised learning? [10.896567381206715]
Prevention of complete and dimensional collapse of representations has recently become a design principle for self-supervised learning (SSL)
We provide answers by thoroughly analyzing SSL loss landscapes for a linear model.
We derive an analytically tractable theory of SSL landscape and show that it accurately captures an array of collapse phenomena and identifies their causes.
arXiv Detail & Related papers (2022-10-02T21:46:16Z) - Understanding Collapse in Non-Contrastive Learning [122.2499276246997]
We show that SimSiam representations undergo partial dimensional collapse if the model is too small relative to the dataset size.
We propose a metric to measure the degree of this collapse and show that it can be used to forecast the downstream task performance without any fine-tuning or labels.
arXiv Detail & Related papers (2022-09-29T17:59:55Z) - Contrasting the landscape of contrastive and non-contrastive learning [25.76544128487728]
We show that even on simple data models, non-contrastive losses have a preponderance of non-collapsed bad minima.
We show that the training process does not avoid these minima.
arXiv Detail & Related papers (2022-03-29T16:08:31Z) - Chaos is a Ladder: A New Theoretical Understanding of Contrastive
Learning via Augmentation Overlap [64.60460828425502]
We propose a new guarantee on the downstream performance of contrastive learning.
Our new theory hinges on the insight that the support of different intra-class samples will become more overlapped under aggressive data augmentations.
We propose an unsupervised model selection metric ARC that aligns well with downstream accuracy.
arXiv Detail & Related papers (2022-03-25T05:36:26Z) - Understanding self-supervised Learning Dynamics without Contrastive
Pairs [72.1743263777693]
Contrastive approaches to self-supervised learning (SSL) learn representations by minimizing the distance between two augmented views of the same data point.
BYOL and SimSiam, show remarkable performance it without negative pairs.
We study the nonlinear learning dynamics of non-contrastive SSL in simple linear networks.
arXiv Detail & Related papers (2021-02-12T22:57:28Z) - Contrastive Learning with Hard Negative Samples [80.12117639845678]
We develop a new family of unsupervised sampling methods for selecting hard negative samples.
A limiting case of this sampling results in a representation that tightly clusters each class, and pushes different classes as far apart as possible.
The proposed method improves downstream performance across multiple modalities, requires only few additional lines of code to implement, and introduces no computational overhead.
arXiv Detail & Related papers (2020-10-09T14:18:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.