Sharp Learning Bounds for Contrastive Unsupervised Representation
Learning
- URL: http://arxiv.org/abs/2110.02501v1
- Date: Wed, 6 Oct 2021 04:29:39 GMT
- Title: Sharp Learning Bounds for Contrastive Unsupervised Representation
Learning
- Authors: Han Bao, Yoshihiro Nagano, Kento Nozawa
- Abstract summary: This study establishes a downstream classification loss bound with a tight intercept in the negative sample size.
By regarding the contrastive loss as a downstream loss estimator, our theory improves the existing learning bounds substantially.
We verify that our theory is consistent with experiments on synthetic, vision, and language datasets.
- Score: 4.017760528208122
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Contrastive unsupervised representation learning (CURL) encourages data
representation to make semantically similar pairs closer than randomly drawn
negative samples, which has been successful in various domains such as vision,
language, and graphs. Although recent theoretical studies have attempted to
explain its success by upper bounds of a downstream classification loss by the
contrastive loss, they are still not sharp enough to explain an experimental
fact: larger negative samples improve the classification performance. This
study establishes a downstream classification loss bound with a tight intercept
in the negative sample size. By regarding the contrastive loss as a downstream
loss estimator, our theory not only improves the existing learning bounds
substantially but also explains why downstream classification empirically
improves with larger negative samples -- because the estimation variance of the
downstream loss decays with larger negative samples. We verify that our theory
is consistent with experiments on synthetic, vision, and language datasets.
Related papers
- Siamese Prototypical Contrastive Learning [24.794022951873156]
Contrastive Self-supervised Learning (CSL) is a practical solution that learns meaningful visual representations from massive data in an unsupervised approach.
In this paper, we tackle this problem by introducing a simple but effective contrastive learning framework.
The key insight is to employ siamese-style metric loss to match intra-prototype features, while increasing the distance between inter-prototype features.
arXiv Detail & Related papers (2022-08-18T13:25:30Z) - Do More Negative Samples Necessarily Hurt in Contrastive Learning? [25.234544066205547]
We show in a simple theoretical setting, where positive pairs are generated by sampling from the underlying latent class, that the downstream performance of the representation does not degrade with the number of negative samples.
We also give a structural characterization of the optimal representation in our framework.
arXiv Detail & Related papers (2022-05-03T21:29:59Z) - Chaos is a Ladder: A New Theoretical Understanding of Contrastive
Learning via Augmentation Overlap [64.60460828425502]
We propose a new guarantee on the downstream performance of contrastive learning.
Our new theory hinges on the insight that the support of different intra-class samples will become more overlapped under aggressive data augmentations.
We propose an unsupervised model selection metric ARC that aligns well with downstream accuracy.
arXiv Detail & Related papers (2022-03-25T05:36:26Z) - Investigating the Role of Negatives in Contrastive Representation
Learning [59.30700308648194]
Noise contrastive learning is a popular technique for unsupervised representation learning.
We focus on disambiguating the role of one of these parameters: the number of negative examples.
We find that the results broadly agree with our theory, while our vision experiments are murkier with performance sometimes even being insensitive to the number of negatives.
arXiv Detail & Related papers (2021-06-18T06:44:16Z) - Incremental False Negative Detection for Contrastive Learning [95.68120675114878]
We introduce a novel incremental false negative detection for self-supervised contrastive learning.
During contrastive learning, we discuss two strategies to explicitly remove the detected false negatives.
Our proposed method outperforms other self-supervised contrastive learning frameworks on multiple benchmarks within a limited compute.
arXiv Detail & Related papers (2021-06-07T15:29:14Z) - Contrastive Attraction and Contrastive Repulsion for Representation
Learning [131.72147978462348]
Contrastive learning (CL) methods learn data representations in a self-supervision manner, where the encoder contrasts each positive sample over multiple negative samples.
Recent CL methods have achieved promising results when pretrained on large-scale datasets, such as ImageNet.
We propose a doubly CL strategy that separately compares positive and negative samples within their own groups, and then proceeds with a contrast between positive and negative groups.
arXiv Detail & Related papers (2021-05-08T17:25:08Z) - Understanding Hard Negatives in Noise Contrastive Estimation [21.602701327267905]
We develop analytical tools to understand the role of hard negatives.
We derive a general form of the score function that unifies various architectures used in text retrieval.
arXiv Detail & Related papers (2021-04-13T14:42:41Z) - Doubly Contrastive Deep Clustering [135.7001508427597]
We present a novel Doubly Contrastive Deep Clustering (DCDC) framework, which constructs contrastive loss over both sample and class views.
Specifically, for the sample view, we set the class distribution of the original sample and its augmented version as positive sample pairs.
For the class view, we build the positive and negative pairs from the sample distribution of the class.
In this way, two contrastive losses successfully constrain the clustering results of mini-batch samples in both sample and class level.
arXiv Detail & Related papers (2021-03-09T15:15:32Z) - A Sober Look at the Unsupervised Learning of Disentangled
Representations and their Evaluation [63.042651834453544]
We show that the unsupervised learning of disentangled representations is impossible without inductive biases on both the models and the data.
We observe that while the different methods successfully enforce properties "encouraged" by the corresponding losses, well-disentangled models seemingly cannot be identified without supervision.
Our results suggest that future work on disentanglement learning should be explicit about the role of inductive biases and (implicit) supervision.
arXiv Detail & Related papers (2020-10-27T10:17:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.