Understanding Collapse in Non-Contrastive Learning
- URL: http://arxiv.org/abs/2209.15007v1
- Date: Thu, 29 Sep 2022 17:59:55 GMT
- Title: Understanding Collapse in Non-Contrastive Learning
- Authors: Alexander C. Li, Alexei A. Efros, Deepak Pathak
- Abstract summary: We show that SimSiam representations undergo partial dimensional collapse if the model is too small relative to the dataset size.
We propose a metric to measure the degree of this collapse and show that it can be used to forecast the downstream task performance without any fine-tuning or labels.
- Score: 122.2499276246997
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contrastive methods have led a recent surge in the performance of
self-supervised representation learning (SSL). Recent methods like BYOL or
SimSiam purportedly distill these contrastive methods down to their essence,
removing bells and whistles, including the negative examples, that do not
contribute to downstream performance. These "non-contrastive" methods work
surprisingly well without using negatives even though the global minimum lies
at trivial collapse. We empirically analyze these non-contrastive methods and
find that SimSiam is extraordinarily sensitive to dataset and model size. In
particular, SimSiam representations undergo partial dimensional collapse if the
model is too small relative to the dataset size. We propose a metric to measure
the degree of this collapse and show that it can be used to forecast the
downstream task performance without any fine-tuning or labels. We further
analyze architectural design choices and their effect on the downstream
performance. Finally, we demonstrate that shifting to a continual learning
setting acts as a regularizer and prevents collapse, and a hybrid between
continual and multi-epoch training can improve linear probe accuracy by as many
as 18 percentage points using ResNet-18 on ImageNet.
Related papers
- SINDER: Repairing the Singular Defects of DINOv2 [61.98878352956125]
Vision Transformer models trained on large-scale datasets often exhibit artifacts in the patch token they extract.
We propose a novel fine-tuning smooth regularization that rectifies structural deficiencies using only a small dataset.
arXiv Detail & Related papers (2024-07-23T20:34:23Z) - Enhancing Consistency and Mitigating Bias: A Data Replay Approach for
Incremental Learning [100.7407460674153]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks.
To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks.
However, it is not expected in practice considering the memory constraint or data privacy issue.
As a replacement, data-free data replay methods are proposed by inverting samples from the classification model.
arXiv Detail & Related papers (2024-01-12T12:51:12Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Siamese Prototypical Contrastive Learning [24.794022951873156]
Contrastive Self-supervised Learning (CSL) is a practical solution that learns meaningful visual representations from massive data in an unsupervised approach.
In this paper, we tackle this problem by introducing a simple but effective contrastive learning framework.
The key insight is to employ siamese-style metric loss to match intra-prototype features, while increasing the distance between inter-prototype features.
arXiv Detail & Related papers (2022-08-18T13:25:30Z) - An Empirical Study on Disentanglement of Negative-free Contrastive
Learning [9.52192693120765]
We take different negative-free contrastive learning methods to study the disentanglement property of this genre of self-supervised methods.
We propose a new disentanglement metric based on Mutual Information between representation and data factors.
Our study shows that the investigated methods can learn a well-disentangled subset of representation.
arXiv Detail & Related papers (2022-06-09T20:25:34Z) - How Does SimSiam Avoid Collapse Without Negative Samples? A Unified
Understanding with Self-supervised Contrastive Learning [79.94590011183446]
To avoid collapse in self-supervised learning, a contrastive loss is widely used but often requires a large number of negative samples.
A recent work has attracted significant attention for providing a minimalist simple Siamese (SimSiam) method to avoid collapse.
arXiv Detail & Related papers (2022-03-30T12:46:31Z) - Contrasting the landscape of contrastive and non-contrastive learning [25.76544128487728]
We show that even on simple data models, non-contrastive losses have a preponderance of non-collapsed bad minima.
We show that the training process does not avoid these minima.
arXiv Detail & Related papers (2022-03-29T16:08:31Z) - Exploring the Impact of Negative Samples of Contrastive Learning: A Case
Study of Sentence Embedding [14.295787044482136]
We present a momentum contrastive learning model with negative sample queue for sentence embedding, namely MoCoSE.
We define a maximum traceable distance metric, through which we learn to what extent the text contrastive learning benefits from the historical information of negative samples.
Our experiments find that the best results are obtained when the maximum traceable distance is at a certain range, demonstrating that there is an optimal range of historical information for a negative sample queue.
arXiv Detail & Related papers (2022-02-26T08:29:25Z) - Understanding self-supervised Learning Dynamics without Contrastive
Pairs [72.1743263777693]
Contrastive approaches to self-supervised learning (SSL) learn representations by minimizing the distance between two augmented views of the same data point.
BYOL and SimSiam, show remarkable performance it without negative pairs.
We study the nonlinear learning dynamics of non-contrastive SSL in simple linear networks.
arXiv Detail & Related papers (2021-02-12T22:57:28Z) - SCE: Scalable Network Embedding from Sparsest Cut [20.08464038805681]
Large-scale network embedding is to learn a latent representation for each node in an unsupervised manner.
A key of success to such contrastive learning methods is how to draw positive and negative samples.
In this paper, we propose SCE for unsupervised network embedding only using negative samples for training.
arXiv Detail & Related papers (2020-06-30T03:18:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.