Feature Dropout: Revisiting the Role of Augmentations in Contrastive
Learning
- URL: http://arxiv.org/abs/2212.08378v1
- Date: Fri, 16 Dec 2022 10:08:38 GMT
- Title: Feature Dropout: Revisiting the Role of Augmentations in Contrastive
Learning
- Authors: Alex Tamkin, Margalit Glasgow, Xiluo He, Noah Goodman
- Abstract summary: Recent work suggests that good augmentations are label-preserving with respect to a specific downstream task.
We show that label-destroying augmentations can be useful in the foundation model setting.
- Score: 7.6834562879925885
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: What role do augmentations play in contrastive learning? Recent work suggests
that good augmentations are label-preserving with respect to a specific
downstream task. We complicate this picture by showing that label-destroying
augmentations can be useful in the foundation model setting, where the goal is
to learn diverse, general-purpose representations for multiple downstream
tasks. We perform contrastive learning experiments on a range of image and
audio datasets with multiple downstream tasks (e.g. for digits superimposed on
photographs, predicting the class of one vs. the other). We find that Viewmaker
Networks, a recently proposed model for learning augmentations for contrastive
learning, produce label-destroying augmentations that stochastically destroy
features needed for different downstream tasks. These augmentations are
interpretable (e.g. altering shapes, digits, or letters added to images) and
surprisingly often result in better performance compared to expert-designed
augmentations, despite not preserving label information. To support our
empirical results, we theoretically analyze a simple contrastive learning
setting with a linear model. In this setting, label-destroying augmentations
are crucial for preventing one set of features from suppressing the learning of
features useful for another downstream task. Our results highlight the need for
analyzing the interaction between multiple downstream tasks when trying to
explain the success of foundation models.
Related papers
- The Trade-off between Universality and Label Efficiency of
Representations from Contrastive Learning [32.15608637930748]
We show that there exists a trade-off between the two desiderata so that one may not be able to achieve both simultaneously.
We provide analysis using a theoretical data model and show that, while more diverse pre-training data result in more diverse features for different tasks, it puts less emphasis on task-specific features.
arXiv Detail & Related papers (2023-02-28T22:14:33Z) - RangeAugment: Efficient Online Augmentation with Range Learning [54.61514286212455]
RangeAugment efficiently learns the range of magnitudes for individual as well as composite augmentation operations.
We show that RangeAugment achieves competitive performance to state-of-the-art automatic augmentation methods with 4-5 times fewer augmentation operations.
arXiv Detail & Related papers (2022-12-20T18:55:54Z) - EquiMod: An Equivariance Module to Improve Self-Supervised Learning [77.34726150561087]
Self-supervised visual representation methods are closing the gap with supervised learning performance.
These methods rely on maximizing the similarity between embeddings of related synthetic inputs created through data augmentations.
We introduce EquiMod a generic equivariance module that structures the learned latent space.
arXiv Detail & Related papers (2022-11-02T16:25:54Z) - Task Formulation Matters When Learning Continually: A Case Study in
Visual Question Answering [58.82325933356066]
Continual learning aims to train a model incrementally on a sequence of tasks without forgetting previous knowledge.
We present a detailed study of how different settings affect performance for Visual Question Answering.
arXiv Detail & Related papers (2022-09-30T19:12:58Z) - Rethinking the Augmentation Module in Contrastive Learning: Learning
Hierarchical Augmentation Invariance with Expanded Views [22.47152165975219]
A data augmentation module is utilized in contrastive learning to transform the given data example into two views.
This paper proposes a general method to alleviate these two problems by considering where and what to contrast in a general contrastive learning framework.
arXiv Detail & Related papers (2022-06-01T04:30:46Z) - MetAug: Contrastive Learning via Meta Feature Augmentation [28.708395209321846]
We argue that contrastive learning heavily relies on informative features, or "hard" (positive or negative) features.
The key challenge toward exploring such features is that the source multi-view data is generated by applying random data augmentations.
We propose to directly augment the features in latent space, thereby learning discriminative representations without a large amount of input data.
arXiv Detail & Related papers (2022-03-10T02:35:39Z) - Why Do Self-Supervised Models Transfer? Investigating the Impact of
Invariance on Downstream Tasks [79.13089902898848]
Self-supervised learning is a powerful paradigm for representation learning on unlabelled images.
We show that different tasks in computer vision require features to encode different (in)variances.
arXiv Detail & Related papers (2021-11-22T18:16:35Z) - Improving Transferability of Representations via Augmentation-Aware
Self-Supervision [117.15012005163322]
AugSelf is an auxiliary self-supervised loss that learns the difference of augmentation parameters between two randomly augmented samples.
Our intuition is that AugSelf encourages to preserve augmentation-aware information in learned representations, which could be beneficial for their transferability.
AugSelf can easily be incorporated into recent state-of-the-art representation learning methods with a negligible additional training cost.
arXiv Detail & Related papers (2021-11-18T10:43:50Z) - Hard Negative Mixing for Contrastive Learning [29.91220669060252]
We argue that an important aspect of contrastive learning, i.e., the effect of hard negatives, has so far been neglected.
We propose hard negative mixing strategies at the feature level, that can be computed on-the-fly with a minimal computational overhead.
arXiv Detail & Related papers (2020-10-02T14:34:58Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z) - Diversity Helps: Unsupervised Few-shot Learning via Distribution
Shift-based Data Augmentation [21.16237189370515]
Few-shot learning aims to learn a new concept when only a few training examples are available.
In this paper, we develop a novel framework called Unsupervised Few-shot Learning via Distribution Shift-based Data Augmentation.
In experiments, few-shot models learned by ULDA can achieve superior generalization performance.
arXiv Detail & Related papers (2020-04-13T07:41:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.