Automatic Shortcut Removal for Self-Supervised Representation Learning
- URL: http://arxiv.org/abs/2002.08822v3
- Date: Tue, 30 Jun 2020 11:15:48 GMT
- Title: Automatic Shortcut Removal for Self-Supervised Representation Learning
- Authors: Matthias Minderer, Olivier Bachem, Neil Houlsby, Michael Tschannen
- Abstract summary: In self-supervised visual representation learning, a feature extractor is trained on a "pretext task" for which labels can be generated cheaply, without human annotation.
Much work has gone into identifying such "shortcut" features and hand-designing schemes to reduce their effect.
We show that this assumption holds across common pretext tasks and datasets by training a "lens" network to make small image changes that maximally reduce performance in the pretext task.
- Score: 39.636691159890354
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In self-supervised visual representation learning, a feature extractor is
trained on a "pretext task" for which labels can be generated cheaply, without
human annotation. A central challenge in this approach is that the feature
extractor quickly learns to exploit low-level visual features such as color
aberrations or watermarks and then fails to learn useful semantic
representations. Much work has gone into identifying such "shortcut" features
and hand-designing schemes to reduce their effect. Here, we propose a general
framework for mitigating the effect shortcut features. Our key assumption is
that those features which are the first to be exploited for solving the pretext
task may also be the most vulnerable to an adversary trained to make the task
harder. We show that this assumption holds across common pretext tasks and
datasets by training a "lens" network to make small image changes that
maximally reduce performance in the pretext task. Representations learned with
the modified images outperform those learned without in all tested cases.
Additionally, the modifications made by the lens reveal how the choice of
pretext task and dataset affects the features learned by self-supervision.
Related papers
- What Makes Pre-Trained Visual Representations Successful for Robust
Manipulation? [57.92924256181857]
We find that visual representations designed for manipulation and control tasks do not necessarily generalize under subtle changes in lighting and scene texture.
We find that emergent segmentation ability is a strong predictor of out-of-distribution generalization among ViT models.
arXiv Detail & Related papers (2023-11-03T18:09:08Z) - Mixture of Self-Supervised Learning [2.191505742658975]
Self-supervised learning works by using a pretext task which will be trained on the model before being applied to a specific task.
Previous studies have only used one type of transformation as a pretext task.
This raises the question of how it affects if more than one pretext task is used and to use a gating network to combine all pretext tasks.
arXiv Detail & Related papers (2023-07-27T14:38:32Z) - Learning Transferable Pedestrian Representation from Multimodal
Information Supervision [174.5150760804929]
VAL-PAT is a novel framework that learns transferable representations to enhance various pedestrian analysis tasks with multimodal information.
We first perform pre-training on LUPerson-TA dataset, where each image contains text and attribute annotations.
We then transfer the learned representations to various downstream tasks, including person reID, person attribute recognition and text-based person search.
arXiv Detail & Related papers (2023-04-12T01:20:58Z) - Self-Supervision on Images and Text Reduces Reliance on Visual Shortcut
Features [0.0]
Shortcut features are inputs that are associated with the outcome of interest in the training data, but are either no longer associated or not present in testing or deployment settings.
We show that self-supervised models trained on images and text provide more robust image representations and reduce the model's reliance on visual shortcut features.
arXiv Detail & Related papers (2022-06-14T20:33:26Z) - Improving Transferability of Representations via Augmentation-Aware
Self-Supervision [117.15012005163322]
AugSelf is an auxiliary self-supervised loss that learns the difference of augmentation parameters between two randomly augmented samples.
Our intuition is that AugSelf encourages to preserve augmentation-aware information in learned representations, which could be beneficial for their transferability.
AugSelf can easily be incorporated into recent state-of-the-art representation learning methods with a negligible additional training cost.
arXiv Detail & Related papers (2021-11-18T10:43:50Z) - Can contrastive learning avoid shortcut solutions? [88.249082564465]
implicit feature modification (IFM) is a method for altering positive and negative samples in order to guide contrastive models towards capturing a wider variety of predictive features.
IFM reduces feature suppression, and as a result improves performance on vision and medical imaging tasks.
arXiv Detail & Related papers (2021-06-21T16:22:43Z) - Automated Self-Supervised Learning for Graphs [37.14382990139527]
This work aims to investigate how to automatically leverage multiple pretext tasks effectively.
We make use of a key principle of many real-world graphs, i.e., homophily, as the guidance to effectively search various self-supervised pretext tasks.
We propose the AutoSSL framework which can automatically search over combinations of various self-supervised tasks.
arXiv Detail & Related papers (2021-06-10T03:09:20Z) - Self-supervision of Feature Transformation for Further Improving
Supervised Learning [6.508466234920147]
We find that features in CNNs can be also used for self-supervision.
In our task we discard different particular regions of features, and then train the model to distinguish these different features.
Original labels will be expanded to joint labels via self-supervision of feature transformations.
arXiv Detail & Related papers (2021-06-09T09:06:33Z) - Predicting What You Already Know Helps: Provable Self-Supervised
Learning [60.27658820909876]
Self-supervised representation learning solves auxiliary prediction tasks (known as pretext tasks) without requiring labeled data.
We show a mechanism exploiting the statistical connections between certain em reconstruction-based pretext tasks that guarantee to learn a good representation.
We prove the linear layer yields small approximation error even for complex ground truth function class.
arXiv Detail & Related papers (2020-08-03T17:56:13Z) - Self-supervised visual feature learning with curriculum [0.24366811507669126]
This paper takes inspiration from curriculum learning to progressively remove low level signals.
It shows that it significantly increase the speed of convergence of the downstream task.
arXiv Detail & Related papers (2020-01-16T03:28:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.