Related papers: Feature Dropout: Revisiting the Role of Augmentations in Contrastive Learning

Feature Dropout: Revisiting the Role of Augmentations in Contrastive Learning

URL: http://arxiv.org/abs/2212.08378v1
Date: Fri, 16 Dec 2022 10:08:38 GMT
Title: Feature Dropout: Revisiting the Role of Augmentations in Contrastive Learning
Authors: Alex Tamkin, Margalit Glasgow, Xiluo He, Noah Goodman
Abstract summary: Recent work suggests that good augmentations are label-preserving with respect to a specific downstream task. We show that label-destroying augmentations can be useful in the foundation model setting.
Score: 7.6834562879925885
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: What role do augmentations play in contrastive learning? Recent work suggests that good augmentations are label-preserving with respect to a specific downstream task. We complicate this picture by showing that label-destroying augmentations can be useful in the foundation model setting, where the goal is to learn diverse, general-purpose representations for multiple downstream tasks. We perform contrastive learning experiments on a range of image and audio datasets with multiple downstream tasks (e.g. for digits superimposed on photographs, predicting the class of one vs. the other). We find that Viewmaker Networks, a recently proposed model for learning augmentations for contrastive learning, produce label-destroying augmentations that stochastically destroy features needed for different downstream tasks. These augmentations are interpretable (e.g. altering shapes, digits, or letters added to images) and surprisingly often result in better performance compared to expert-designed augmentations, despite not preserving label information. To support our empirical results, we theoretically analyze a simple contrastive learning setting with a linear model. In this setting, label-destroying augmentations are crucial for preventing one set of features from suppressing the learning of features useful for another downstream task. Our results highlight the need for analyzing the interaction between multiple downstream tasks when trying to explain the success of foundation models.

Related papers

The Trade-off between Universality and Label Efficiency of Representations from Contrastive Learning [32.15608637930748]
We show that there exists a trade-off between the two desiderata so that one may not be able to achieve both simultaneously. We provide analysis using a theoretical data model and show that, while more diverse pre-training data result in more diverse features for different tasks, it puts less emphasis on task-specific features.
arXiv Detail & Related papers (2023-02-28T22:14:33Z)
RangeAugment: Efficient Online Augmentation with Range Learning [54.61514286212455]
RangeAugment efficiently learns the range of magnitudes for individual as well as composite augmentation operations. We show that RangeAugment achieves competitive performance to state-of-the-art automatic augmentation methods with 4-5 times fewer augmentation operations.
arXiv Detail & Related papers (2022-12-20T18:55:54Z)
EquiMod: An Equivariance Module to Improve Self-Supervised Learning [77.34726150561087]
Self-supervised visual representation methods are closing the gap with supervised learning performance. These methods rely on maximizing the similarity between embeddings of related synthetic inputs created through data augmentations. We introduce EquiMod a generic equivariance module that structures the learned latent space.
arXiv Detail & Related papers (2022-11-02T16:25:54Z)
Task Formulation Matters When Learning Continually: A Case Study in Visual Question Answering [58.82325933356066]
Continual learning aims to train a model incrementally on a sequence of tasks without forgetting previous knowledge. We present a detailed study of how different settings affect performance for Visual Question Answering.
arXiv Detail & Related papers (2022-09-30T19:12:58Z)
Rethinking the Augmentation Module in Contrastive Learning: Learning Hierarchical Augmentation Invariance with Expanded Views [22.47152165975219]
A data augmentation module is utilized in contrastive learning to transform the given data example into two views. This paper proposes a general method to alleviate these two problems by considering where and what to contrast in a general contrastive learning framework.
arXiv Detail & Related papers (2022-06-01T04:30:46Z)
MetAug: Contrastive Learning via Meta Feature Augmentation [28.708395209321846]
We argue that contrastive learning heavily relies on informative features, or "hard" (positive or negative) features. The key challenge toward exploring such features is that the source multi-view data is generated by applying random data augmentations. We propose to directly augment the features in latent space, thereby learning discriminative representations without a large amount of input data.
arXiv Detail & Related papers (2022-03-10T02:35:39Z)
Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks [79.13089902898848]
Self-supervised learning is a powerful paradigm for representation learning on unlabelled images. We show that different tasks in computer vision require features to encode different (in)variances.
arXiv Detail & Related papers (2021-11-22T18:16:35Z)
Improving Transferability of Representations via Augmentation-Aware Self-Supervision [117.15012005163322]
AugSelf is an auxiliary self-supervised loss that learns the difference of augmentation parameters between two randomly augmented samples. Our intuition is that AugSelf encourages to preserve augmentation-aware information in learned representations, which could be beneficial for their transferability. AugSelf can easily be incorporated into recent state-of-the-art representation learning methods with a negligible additional training cost.
arXiv Detail & Related papers (2021-11-18T10:43:50Z)
Hard Negative Mixing for Contrastive Learning [29.91220669060252]
We argue that an important aspect of contrastive learning, i.e., the effect of hard negatives, has so far been neglected. We propose hard negative mixing strategies at the feature level, that can be computed on-the-fly with a minimal computational overhead.
arXiv Detail & Related papers (2020-10-02T14:34:58Z)
Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks. We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task. Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
Diversity Helps: Unsupervised Few-shot Learning via Distribution Shift-based Data Augmentation [21.16237189370515]
Few-shot learning aims to learn a new concept when only a few training examples are available. In this paper, we develop a novel framework called Unsupervised Few-shot Learning via Distribution Shift-based Data Augmentation. In experiments, few-shot models learned by ULDA can achieve superior generalization performance.
arXiv Detail & Related papers (2020-04-13T07:41:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.