Chaos is a Ladder: A New Theoretical Understanding of Contrastive
Learning via Augmentation Overlap
- URL: http://arxiv.org/abs/2203.13457v1
- Date: Fri, 25 Mar 2022 05:36:26 GMT
- Title: Chaos is a Ladder: A New Theoretical Understanding of Contrastive
Learning via Augmentation Overlap
- Authors: Yifei Wang, Qi Zhang, Yisen Wang, Jiansheng Yang, Zhouchen Lin
- Abstract summary: We propose a new guarantee on the downstream performance of contrastive learning.
Our new theory hinges on the insight that the support of different intra-class samples will become more overlapped under aggressive data augmentations.
We propose an unsupervised model selection metric ARC that aligns well with downstream accuracy.
- Score: 64.60460828425502
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, contrastive learning has risen to be a promising approach for
large-scale self-supervised learning. However, theoretical understanding of how
it works is still unclear. In this paper, we propose a new guarantee on the
downstream performance without resorting to the conditional independence
assumption that is widely adopted in previous work but hardly holds in
practice. Our new theory hinges on the insight that the support of different
intra-class samples will become more overlapped under aggressive data
augmentations, thus simply aligning the positive samples (augmented views of
the same sample) could make contrastive learning cluster intra-class samples
together. Based on this augmentation overlap perspective, theoretically, we
obtain asymptotically closed bounds for downstream performance under weaker
assumptions, and empirically, we propose an unsupervised model selection metric
ARC that aligns well with downstream accuracy. Our theory suggests an
alternative understanding of contrastive learning: the role of aligning
positive samples is more like a surrogate task than an ultimate goal, and the
overlapped augmented views (i.e., the chaos) create a ladder for contrastive
learning to gradually learn class-separated representations. The code for
computing ARC is available at https://github.com/zhangq327/ARC.
Related papers
- BECLR: Batch Enhanced Contrastive Few-Shot Learning [1.450405446885067]
Unsupervised few-shot learning aspires to bridge this gap by discarding the reliance on annotations at training time.
We propose a novel Dynamic Clustered mEmory (DyCE) module to promote a highly separable latent representation space.
We then tackle the, somehow overlooked yet critical, issue of sample bias at the few-shot inference stage.
arXiv Detail & Related papers (2024-02-04T10:52:43Z) - Integrating Prior Knowledge in Contrastive Learning with Kernel [4.050766659420731]
We use kernel theory to propose a novel loss, called decoupled uniformity, that i) allows the integration of prior knowledge and ii) removes the negative-positive coupling in the original InfoNCE loss.
In an unsupervised setting, we empirically demonstrate that CL benefits from generative models to improve its representation both on natural and medical images.
arXiv Detail & Related papers (2022-06-03T15:43:08Z) - Do More Negative Samples Necessarily Hurt in Contrastive Learning? [25.234544066205547]
We show in a simple theoretical setting, where positive pairs are generated by sampling from the underlying latent class, that the downstream performance of the representation does not degrade with the number of negative samples.
We also give a structural characterization of the optimal representation in our framework.
arXiv Detail & Related papers (2022-05-03T21:29:59Z) - Improving Self-supervised Learning with Automated Unsupervised Outlier
Arbitration [83.29856873525674]
We introduce a lightweight latent variable model UOTA, targeting the view sampling issue for self-supervised learning.
Our method directly generalizes to many mainstream self-supervised learning approaches.
arXiv Detail & Related papers (2021-12-15T14:05:23Z) - A Low Rank Promoting Prior for Unsupervised Contrastive Learning [108.91406719395417]
We construct a novel probabilistic graphical model that effectively incorporates the low rank promoting prior into the framework of contrastive learning.
Our hypothesis explicitly requires that all the samples belonging to the same instance class lie on the same subspace with small dimension.
Empirical evidences show that the proposed algorithm clearly surpasses the state-of-the-art approaches on multiple benchmarks.
arXiv Detail & Related papers (2021-08-05T15:58:25Z) - Self-Damaging Contrastive Learning [92.34124578823977]
Unlabeled data in reality is commonly imbalanced and shows a long-tail distribution.
This paper proposes a principled framework called Self-Damaging Contrastive Learning to automatically balance the representation learning without knowing the classes.
Our experiments show that SDCLR significantly improves not only overall accuracies but also balancedness.
arXiv Detail & Related papers (2021-06-06T00:04:49Z) - Solving Inefficiency of Self-supervised Representation Learning [87.30876679780532]
Existing contrastive learning methods suffer from very low learning efficiency.
Under-clustering and over-clustering problems are major obstacles to learning efficiency.
We propose a novel self-supervised learning framework using a median triplet loss.
arXiv Detail & Related papers (2021-04-18T07:47:10Z) - Understanding self-supervised Learning Dynamics without Contrastive
Pairs [72.1743263777693]
Contrastive approaches to self-supervised learning (SSL) learn representations by minimizing the distance between two augmented views of the same data point.
BYOL and SimSiam, show remarkable performance it without negative pairs.
We study the nonlinear learning dynamics of non-contrastive SSL in simple linear networks.
arXiv Detail & Related papers (2021-02-12T22:57:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.