Understanding Negative Samples in Instance Discriminative
Self-supervised Representation Learning
- URL: http://arxiv.org/abs/2102.06866v1
- Date: Sat, 13 Feb 2021 05:46:33 GMT
- Title: Understanding Negative Samples in Instance Discriminative
Self-supervised Representation Learning
- Authors: Kento Nozawa, Issei Sato
- Abstract summary: Self-supervised representation learning commonly uses more negative samples than the number of supervised classes in practice.
We theoretically explain this empirical result regarding negative samples.
We empirically confirm our analysis by conducting numerical experiments on CIFAR-10/100 datasets.
- Score: 29.583194697391253
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Instance discriminative self-supervised representation learning has been
attracted attention thanks to its unsupervised nature and informative feature
representation for downstream tasks. Self-supervised representation learning
commonly uses more negative samples than the number of supervised classes in
practice. However, there is an inconsistency in the existing analysis;
theoretically, a large number of negative samples degrade supervised
performance, while empirically, they improve the performance. We theoretically
explain this empirical result regarding negative samples. We empirically
confirm our analysis by conducting numerical experiments on CIFAR-10/100
datasets.
Related papers
- Harnessing Discrete Representations For Continual Reinforcement Learning [8.61539229796467]
We investigate the advantages of representing observations as vectors of categorical values within the context of reinforcement learning.
We find that, when compared to traditional continuous representations, world models learned over discrete representations accurately model more of the world with less capacity.
arXiv Detail & Related papers (2023-12-02T18:55:26Z) - Do More Negative Samples Necessarily Hurt in Contrastive Learning? [25.234544066205547]
We show in a simple theoretical setting, where positive pairs are generated by sampling from the underlying latent class, that the downstream performance of the representation does not degrade with the number of negative samples.
We also give a structural characterization of the optimal representation in our framework.
arXiv Detail & Related papers (2022-05-03T21:29:59Z) - Understanding Contrastive Learning Requires Incorporating Inductive
Biases [64.56006519908213]
Recent attempts to theoretically explain the success of contrastive learning on downstream tasks prove guarantees depending on properties of em augmentations and the value of em contrastive loss of representations.
We demonstrate that such analyses ignore em inductive biases of the function class and training algorithm, even em provably leading to vacuous guarantees in some settings.
arXiv Detail & Related papers (2022-02-28T18:59:20Z) - Unsupervised Embedding Learning from Uncertainty Momentum Modeling [37.674449317054716]
We propose a novel solution to explicitly model and explore the uncertainty of the given unlabeled learning samples.
We leverage such uncertainty modeling momentum to the learning which is helpful to tackle the outliers.
arXiv Detail & Related papers (2021-07-19T14:06:19Z) - Investigating the Role of Negatives in Contrastive Representation
Learning [59.30700308648194]
Noise contrastive learning is a popular technique for unsupervised representation learning.
We focus on disambiguating the role of one of these parameters: the number of negative examples.
We find that the results broadly agree with our theory, while our vision experiments are murkier with performance sometimes even being insensitive to the number of negatives.
arXiv Detail & Related papers (2021-06-18T06:44:16Z) - Rethinking InfoNCE: How Many Negative Samples Do You Need? [54.146208195806636]
We study how many negative samples are optimal for InfoNCE in different scenarios via a semi-quantitative theoretical framework.
We estimate the optimal negative sampling ratio using the $K$ value that maximizes the training effectiveness function.
arXiv Detail & Related papers (2021-05-27T08:38:29Z) - Contrastive Attraction and Contrastive Repulsion for Representation
Learning [131.72147978462348]
Contrastive learning (CL) methods learn data representations in a self-supervision manner, where the encoder contrasts each positive sample over multiple negative samples.
Recent CL methods have achieved promising results when pretrained on large-scale datasets, such as ImageNet.
We propose a doubly CL strategy that separately compares positive and negative samples within their own groups, and then proceeds with a contrast between positive and negative groups.
arXiv Detail & Related papers (2021-05-08T17:25:08Z) - Odd-One-Out Representation Learning [1.6822770693792826]
We show that a weakly-supervised downstream task based on odd-one-out observations is suitable for model selection.
We also show that a bespoke metric-learning VAE model which performs highly on this task also out-performs other standard unsupervised and a weakly-supervised disentanglement model.
arXiv Detail & Related papers (2020-12-14T22:01:15Z) - A Sober Look at the Unsupervised Learning of Disentangled
Representations and their Evaluation [63.042651834453544]
We show that the unsupervised learning of disentangled representations is impossible without inductive biases on both the models and the data.
We observe that while the different methods successfully enforce properties "encouraged" by the corresponding losses, well-disentangled models seemingly cannot be identified without supervision.
Our results suggest that future work on disentanglement learning should be explicit about the role of inductive biases and (implicit) supervision.
arXiv Detail & Related papers (2020-10-27T10:17:15Z) - Weakly-Supervised Disentanglement Without Compromises [53.55580957483103]
Intelligent agents should be able to learn useful representations by observing changes in their environment.
We model such observations as pairs of non-i.i.d. images sharing at least one of the underlying factors of variation.
We show that only knowing how many factors have changed, but not which ones, is sufficient to learn disentangled representations.
arXiv Detail & Related papers (2020-02-07T16:39:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.