CIPER: Combining Invariant and Equivariant Representations Using
Contrastive and Predictive Learning
- URL: http://arxiv.org/abs/2302.02330v2
- Date: Tue, 18 Jul 2023 20:54:24 GMT
- Title: CIPER: Combining Invariant and Equivariant Representations Using
Contrastive and Predictive Learning
- Authors: Xia Xu, Jochen Triesch
- Abstract summary: We introduce Contrastive Invariant and Predictive Equivariant Representation learning (CIPER)
CIPER comprises both invariant and equivariant learning objectives using one shared encoder and two different output heads on top of the encoder.
We evaluate our method on static image tasks and time-augmented image datasets.
- Score: 6.117084972237769
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised representation learning (SSRL) methods have shown great
success in computer vision. In recent studies, augmentation-based contrastive
learning methods have been proposed for learning representations that are
invariant or equivariant to pre-defined data augmentation operations. However,
invariant or equivariant features favor only specific downstream tasks
depending on the augmentations chosen. They may result in poor performance when
the learned representation does not match task requirements. Here, we consider
an active observer that can manipulate views of an object and has knowledge of
the action(s) that generated each view. We introduce Contrastive Invariant and
Predictive Equivariant Representation learning (CIPER). CIPER comprises both
invariant and equivariant learning objectives using one shared encoder and two
different output heads on top of the encoder. One output head is a projection
head with a state-of-the-art contrastive objective to encourage invariance to
augmentations. The other is a prediction head estimating the augmentation
parameters, capturing equivariant features. Both heads are discarded after
training and only the encoder is used for downstream tasks. We evaluate our
method on static image tasks and time-augmented image datasets. Our results
show that CIPER outperforms a baseline contrastive method on various tasks.
Interestingly, CIPER encourages the formation of hierarchically structured
representations where different views of an object become systematically
organized in the latent representation space.
Related papers
- What Makes Pre-Trained Visual Representations Successful for Robust
Manipulation? [57.92924256181857]
We find that visual representations designed for manipulation and control tasks do not necessarily generalize under subtle changes in lighting and scene texture.
We find that emergent segmentation ability is a strong predictor of out-of-distribution generalization among ViT models.
arXiv Detail & Related papers (2023-11-03T18:09:08Z) - MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks.
We propose a single-stage and standalone method, MOCA, which unifies both desired properties.
We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z) - Amortised Invariance Learning for Contrastive Self-Supervision [11.042648980854485]
We introduce the notion of amortised invariance learning for contrastive self supervision.
We show that our amortised features provide a reliable way to learn diverse downstream tasks with different invariance requirements.
This provides an exciting perspective that opens up new horizons in the field of general purpose representation learning.
arXiv Detail & Related papers (2023-02-24T16:15:11Z) - Self-supervised learning of Split Invariant Equivariant representations [0.0]
We introduce 3DIEBench, consisting of renderings from 3D models over 55 classes and more than 2.5 million images where we have full control on the transformations applied to the objects.
We introduce a predictor architecture based on hypernetworks to learn equivariant representations with no possible collapse to invariance.
We introduce SIE (Split Invariant-Equivariant) which combines the hypernetwork-based predictor with representations split in two parts, one invariant, the other equivariant, to learn richer representations.
arXiv Detail & Related papers (2023-02-14T07:53:18Z) - Why Do Self-Supervised Models Transfer? Investigating the Impact of
Invariance on Downstream Tasks [79.13089902898848]
Self-supervised learning is a powerful paradigm for representation learning on unlabelled images.
We show that different tasks in computer vision require features to encode different (in)variances.
arXiv Detail & Related papers (2021-11-22T18:16:35Z) - Improving Transferability of Representations via Augmentation-Aware
Self-Supervision [117.15012005163322]
AugSelf is an auxiliary self-supervised loss that learns the difference of augmentation parameters between two randomly augmented samples.
Our intuition is that AugSelf encourages to preserve augmentation-aware information in learned representations, which could be beneficial for their transferability.
AugSelf can easily be incorporated into recent state-of-the-art representation learning methods with a negligible additional training cost.
arXiv Detail & Related papers (2021-11-18T10:43:50Z) - Improving Contrastive Learning by Visualizing Feature Transformation [37.548120912055595]
In this paper, we attempt to devise a feature-level data manipulation, differing from data augmentation, to enhance the generic contrastive self-supervised learning.
We first design a visualization scheme for pos/neg score (Pos/neg score indicates similarity of pos/neg pair.) distribution, which enables us to analyze, interpret and understand the learning process.
Experiment results show that our proposed Feature Transformation can improve at least 6.0% accuracy on ImageNet-100 over MoCo baseline, and about 2.0% accuracy on ImageNet-1K over the MoCoV2 baseline.
arXiv Detail & Related papers (2021-08-06T07:26:08Z) - Revisiting Contrastive Methods for Unsupervised Learning of Visual
Representations [78.12377360145078]
Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection.
In this paper, we first study how biases in the dataset affect existing methods.
We show that current contrastive approaches work surprisingly well across: (i) object- versus scene-centric, (ii) uniform versus long-tailed and (iii) general versus domain-specific datasets.
arXiv Detail & Related papers (2021-06-10T17:59:13Z) - What Should Not Be Contrastive in Contrastive Learning [110.14159883496859]
We introduce a contrastive learning framework which does not require prior knowledge of specific, task-dependent invariances.
Our model learns to capture varying and invariant factors for visual representations by constructing separate embedding spaces.
We use a multi-head network with a shared backbone which captures information across each augmentation and alone outperforms all baselines on downstream tasks.
arXiv Detail & Related papers (2020-08-13T03:02:32Z) - Demystifying Contrastive Self-Supervised Learning: Invariances,
Augmentations and Dataset Biases [34.02639091680309]
Recent gains in performance come from training instance classification models, treating each image and it's augmented versions as samples of a single class.
We demonstrate that approaches like MOCO and PIRL learn occlusion-invariant representations.
Second, we demonstrate that these approaches obtain further gains from access to a clean object-centric training dataset like Imagenet.
arXiv Detail & Related papers (2020-07-28T00:11:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.