On Compositions of Transformations in Contrastive Self-Supervised
Learning
- URL: http://arxiv.org/abs/2003.04298v3
- Date: Wed, 27 Oct 2021 12:00:29 GMT
- Title: On Compositions of Transformations in Contrastive Self-Supervised
Learning
- Authors: Mandela Patrick, Yuki M. Asano, Polina Kuznetsova, Ruth Fong, Jo\~ao
F. Henriques, Geoffrey Zweig, Andrea Vedaldi
- Abstract summary: In this paper, we generalize contrastive learning to a wider set of transformations.
We find that being invariant to certain transformations and distinctive to others is critical to learning effective video representations.
- Score: 66.15514035861048
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the image domain, excellent representations can be learned by inducing
invariance to content-preserving transformations via noise contrastive
learning. In this paper, we generalize contrastive learning to a wider set of
transformations, and their compositions, for which either invariance or
distinctiveness is sought. We show that it is not immediately obvious how
existing methods such as SimCLR can be extended to do so. Instead, we introduce
a number of formal requirements that all contrastive formulations must satisfy,
and propose a practical construction which satisfies these requirements. In
order to maximise the reach of this analysis, we express all components of
noise contrastive formulations as the choice of certain generalized
transformations of the data (GDTs), including data sampling. We then consider
videos as an example of data in which a large variety of transformations are
applicable, accounting for the extra modalities -- for which we analyze audio
and text -- and the dimension of time. We find that being invariant to certain
transformations and distinctive to others is critical to learning effective
video representations, improving the state-of-the-art for multiple benchmarks
by a large margin, and even surpassing supervised pretraining.
Related papers
- Strengthening Structural Inductive Biases by Pre-training to Perform Syntactic Transformations [75.14793516745374]
We propose to strengthen the structural inductive bias of a Transformer by intermediate pre-training.
Our experiments confirm that this helps with few-shot learning of syntactic tasks such as chunking.
Our analysis shows that the intermediate pre-training leads to attention heads that keep track of which syntactic transformation needs to be applied to which token.
arXiv Detail & Related papers (2024-07-05T14:29:44Z) - In-Context Symmetries: Self-Supervised Learning through Contextual World Models [41.61360016455319]
We propose to learn a general representation that can adapt to be invariant or equivariant to different transformations by paying attention to context.
Our proposed algorithm, Contextual Self-Supervised Learning (ContextSSL), learns equivariance to all transformations.
arXiv Detail & Related papers (2024-05-28T14:03:52Z) - From Bricks to Bridges: Product of Invariances to Enhance Latent Space Communication [19.336940758147442]
It has been observed that representations learned by distinct neural networks conceal structural similarities when the models are trained under similar inductive biases.
We introduce a versatile method to directly incorporate a set of invariances into the representations, constructing a product space of invariant components on top of the latent representations.
We validate our solution on classification and reconstruction tasks, observing consistent latent similarity and downstream performance improvements in a zero-shot stitching setting.
arXiv Detail & Related papers (2023-10-02T13:55:38Z) - Flow Factorized Representation Learning [109.51947536586677]
We introduce a generative model which specifies a distinct set of latent probability paths that define different input transformations.
We show that our model achieves higher likelihoods on standard representation learning benchmarks while simultaneously being closer to approximately equivariant models.
arXiv Detail & Related papers (2023-09-22T20:15:37Z) - ParGAN: Learning Real Parametrizable Transformations [50.51405390150066]
We propose ParGAN, a generalization of the cycle-consistent GAN framework to learn image transformations.
The proposed generator takes as input both an image and a parametrization of the transformation.
We show how, with disjoint image domains with no annotated parametrization, our framework can create smooths as well as learn multiple transformations simultaneously.
arXiv Detail & Related papers (2022-11-09T16:16:06Z) - Data augmentation with mixtures of max-entropy transformations for
filling-level classification [88.14088768857242]
We address the problem of distribution shifts in test-time data with a principled data augmentation scheme for the task of content-level classification.
We show that such a principled augmentation scheme, alone, can replace current approaches that use transfer learning or can be used in combination with transfer learning to improve its performance.
arXiv Detail & Related papers (2022-03-08T11:41:38Z) - Adaptive Discrete Communication Bottlenecks with Dynamic Vector
Quantization [76.68866368409216]
We propose learning to dynamically select discretization tightness conditioned on inputs.
We show that dynamically varying tightness in communication bottlenecks can improve model performance on visual reasoning and reinforcement learning tasks.
arXiv Detail & Related papers (2022-02-02T23:54:26Z) - Self-Supervised Learning with Data Augmentations Provably Isolates
Content from Style [32.20957709045773]
We formulate the augmentation process as a latent variable model.
We study the identifiability of the latent representation based on pairs of views of the observations.
We introduce Causal3DIdent, a dataset of high-dimensional, visually complex images with rich causal dependencies.
arXiv Detail & Related papers (2021-06-08T18:18:09Z) - Improving Transformation Invariance in Contrastive Representation
Learning [31.223892428863238]
We introduce a training objective for contrastive learning that uses a novel regularizer to control how the representation changes under transformation.
Second, we propose a change to how test time representations are generated by introducing a feature averaging approach that combines encodings from multiple transformations of the original input.
Third, we introduce the novel Spirograph dataset to explore our ideas in the context of a differentiable generative process with multiple downstream tasks.
arXiv Detail & Related papers (2020-10-19T13:49:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.