What Should Not Be Contrastive in Contrastive Learning
- URL: http://arxiv.org/abs/2008.05659v2
- Date: Thu, 18 Mar 2021 21:08:52 GMT
- Title: What Should Not Be Contrastive in Contrastive Learning
- Authors: Tete Xiao, Xiaolong Wang, Alexei A. Efros, Trevor Darrell
- Abstract summary: We introduce a contrastive learning framework which does not require prior knowledge of specific, task-dependent invariances.
Our model learns to capture varying and invariant factors for visual representations by constructing separate embedding spaces.
We use a multi-head network with a shared backbone which captures information across each augmentation and alone outperforms all baselines on downstream tasks.
- Score: 110.14159883496859
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent self-supervised contrastive methods have been able to produce
impressive transferable visual representations by learning to be invariant to
different data augmentations. However, these methods implicitly assume a
particular set of representational invariances (e.g., invariance to color), and
can perform poorly when a downstream task violates this assumption (e.g.,
distinguishing red vs. yellow cars). We introduce a contrastive learning
framework which does not require prior knowledge of specific, task-dependent
invariances. Our model learns to capture varying and invariant factors for
visual representations by constructing separate embedding spaces, each of which
is invariant to all but one augmentation. We use a multi-head network with a
shared backbone which captures information across each augmentation and alone
outperforms all baselines on downstream tasks. We further find that the
concatenation of the invariant and varying spaces performs best across all
tasks we investigate, including coarse-grained, fine-grained, and few-shot
downstream classification tasks, and various data corruptions.
Related papers
- Self-Supervised Disentanglement by Leveraging Structure in Data Augmentations [63.73044203154743]
Self-supervised representation learning often uses data augmentations to induce "style" attributes of the data.
It is difficult to deduce a priori which attributes of the data are indeed "style" and can be safely discarded.
We introduce a more principled approach that seeks to disentangle style features rather than discard them.
arXiv Detail & Related papers (2023-11-15T09:34:08Z) - Leveraging sparse and shared feature activations for disentangled
representation learning [112.22699167017471]
We propose to leverage knowledge extracted from a diversified set of supervised tasks to learn a common disentangled representation.
We validate our approach on six real world distribution shift benchmarks, and different data modalities.
arXiv Detail & Related papers (2023-04-17T01:33:24Z) - Amortised Invariance Learning for Contrastive Self-Supervision [11.042648980854485]
We introduce the notion of amortised invariance learning for contrastive self supervision.
We show that our amortised features provide a reliable way to learn diverse downstream tasks with different invariance requirements.
This provides an exciting perspective that opens up new horizons in the field of general purpose representation learning.
arXiv Detail & Related papers (2023-02-24T16:15:11Z) - EquiMod: An Equivariance Module to Improve Self-Supervised Learning [77.34726150561087]
Self-supervised visual representation methods are closing the gap with supervised learning performance.
These methods rely on maximizing the similarity between embeddings of related synthetic inputs created through data augmentations.
We introduce EquiMod a generic equivariance module that structures the learned latent space.
arXiv Detail & Related papers (2022-11-02T16:25:54Z) - Rethinking the Augmentation Module in Contrastive Learning: Learning
Hierarchical Augmentation Invariance with Expanded Views [22.47152165975219]
A data augmentation module is utilized in contrastive learning to transform the given data example into two views.
This paper proposes a general method to alleviate these two problems by considering where and what to contrast in a general contrastive learning framework.
arXiv Detail & Related papers (2022-06-01T04:30:46Z) - Why Do Self-Supervised Models Transfer? Investigating the Impact of
Invariance on Downstream Tasks [79.13089902898848]
Self-supervised learning is a powerful paradigm for representation learning on unlabelled images.
We show that different tasks in computer vision require features to encode different (in)variances.
arXiv Detail & Related papers (2021-11-22T18:16:35Z) - Improving Transferability of Representations via Augmentation-Aware
Self-Supervision [117.15012005163322]
AugSelf is an auxiliary self-supervised loss that learns the difference of augmentation parameters between two randomly augmented samples.
Our intuition is that AugSelf encourages to preserve augmentation-aware information in learned representations, which could be beneficial for their transferability.
AugSelf can easily be incorporated into recent state-of-the-art representation learning methods with a negligible additional training cost.
arXiv Detail & Related papers (2021-11-18T10:43:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.