TiCo: Transformation Invariance and Covariance Contrast for
  Self-Supervised Visual Representation Learning
        - URL: http://arxiv.org/abs/2206.10698v2
- Date: Thu, 23 Jun 2022 17:36:11 GMT
- Title: TiCo: Transformation Invariance and Covariance Contrast for
  Self-Supervised Visual Representation Learning
- Authors: Jiachen Zhu, Rafael M. Moraes, Serkan Karakulak, Vlad Sobol, Alfredo
  Canziani, Yann LeCun
- Abstract summary: We present Transformation Invariance and Covariance Contrast (TiCo) for self-supervised visual representation learning.
Our method is based on maximizing the agreement among embeddings of different distorted versions of the same image.
We show that TiCo can be viewed as a variant of MoCo with an implicit memory bank of unlimited size at no extra memory cost.
- Score: 9.507070656654632
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract:   We present Transformation Invariance and Covariance Contrast (TiCo) for
self-supervised visual representation learning. Similar to other recent
self-supervised learning methods, our method is based on maximizing the
agreement among embeddings of different distorted versions of the same image,
which pushes the encoder to produce transformation invariant representations.
To avoid the trivial solution where the encoder generates constant vectors, we
regularize the covariance matrix of the embeddings from different images by
penalizing low rank solutions. By jointly minimizing the transformation
invariance loss and covariance contrast loss, we get an encoder that is able to
produce useful representations for downstream tasks. We analyze our method and
show that it can be viewed as a variant of MoCo with an implicit memory bank of
unlimited size at no extra memory cost. This makes our method perform better
than alternative methods when using small batch sizes. TiCo can also be seen as
a modification of Barlow Twins. By connecting the contrastive and
redundancy-reduction methods together, TiCo gives us new insights into how
joint embedding methods work.
 
      
        Related papers
        - Self-supervised Transformation Learning for Equivariant Representations [26.207358743969277]
 Unsupervised representation learning has significantly advanced various machine learning tasks.
We propose Self-supervised Transformation Learning (STL), replacing transformation labels with transformation representations derived from image pairs.
We demonstrate the approach's effectiveness across diverse classification and detection tasks, outperforming existing methods in 7 out of 11 benchmarks.
 arXiv  Detail & Related papers  (2025-01-15T10:54:21Z)
- PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional   Pseudo-Negative Embeddings [55.55445978692678]
 PseudoNeg-MAE enhances global feature representation of point cloud masked autoencoders by making them both discriminative and sensitive to transformations.<n>We propose a novel loss that explicitly penalizes invariant collapse, enabling the network to capture richer transformation cues while preserving discriminative representations.
 arXiv  Detail & Related papers  (2024-09-24T07:57:21Z)
- Which Tokens to Use? Investigating Token Reduction in Vision
  Transformers [64.99704164972513]
 We study the reduction patterns of 10 different token reduction methods using four image classification datasets.
We find that the Top-K pruning method is a surprisingly strong baseline.
The similarity of reduction patterns is a moderate-to-strong proxy for model performance.
 arXiv  Detail & Related papers  (2023-08-09T01:51:07Z)
- EquiMod: An Equivariance Module to Improve Self-Supervised Learning [77.34726150561087]
 Self-supervised visual representation methods are closing the gap with supervised learning performance.
These methods rely on maximizing the similarity between embeddings of related synthetic inputs created through data augmentations.
We introduce EquiMod a generic equivariance module that structures the learned latent space.
 arXiv  Detail & Related papers  (2022-11-02T16:25:54Z)
- A Reinforcement Learning Approach for Sequential Spatial Transformer
  Networks [6.585049648605185]
 We formulate the task as a Markovian Decision Process (MDP) and use RL to solve this sequential decision-making problem.
In our method, we are not bound to the differentiability of the sampling modules.
We design multiple experiments to verify the effectiveness of our method using cluttered MNIST and Fashion-MNIST datasets.
 arXiv  Detail & Related papers  (2021-06-27T17:41:17Z)
- VICReg: Variance-Invariance-Covariance Regularization for
  Self-Supervised Learning [43.96465407127458]
 We introduce VICReg, a method that explicitly avoids the collapse problem with a simple regularization term on the variance of the embeddings.
 VICReg achieves results on par with the state of the art on several downstream tasks.
 arXiv  Detail & Related papers  (2021-05-11T09:53:21Z)
- Invariant Deep Compressible Covariance Pooling for Aerial Scene
  Categorization [80.55951673479237]
 We propose a novel invariant deep compressible covariance pooling (IDCCP) to solve nuisance variations in aerial scene categorization.
We conduct extensive experiments on the publicly released aerial scene image data sets and demonstrate the superiority of this method compared with state-of-the-art methods.
 arXiv  Detail & Related papers  (2020-11-11T11:13:07Z)
- Improving Transformation Invariance in Contrastive Representation
  Learning [31.223892428863238]
 We introduce a training objective for contrastive learning that uses a novel regularizer to control how the representation changes under transformation.
Second, we propose a change to how test time representations are generated by introducing a feature averaging approach that combines encodings from multiple transformations of the original input.
Third, we introduce the novel Spirograph dataset to explore our ideas in the context of a differentiable generative process with multiple downstream tasks.
 arXiv  Detail & Related papers  (2020-10-19T13:49:29Z)
- FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning [64.32306537419498]
 We propose a novel learned feature-based refinement and augmentation method that produces a varied set of complex transformations.
These transformations also use information from both within-class and across-class representations that we extract through clustering.
We demonstrate that our method is comparable to current state of art for smaller datasets while being able to scale up to larger datasets.
 arXiv  Detail & Related papers  (2020-07-16T17:55:31Z)
- Meta-Learning Symmetries by Reparameterization [63.85144439337671]
 We present a method for learning and encoding equivariances into networks by learning corresponding parameter sharing patterns from data.
Our experiments suggest that it can automatically learn to encode equivariances to common transformations used in image processing tasks.
 arXiv  Detail & Related papers  (2020-07-06T17:59:54Z)
- Unsupervised Learning of Visual Features by Contrasting Cluster
  Assignments [57.33699905852397]
 We propose an online algorithm, SwAV, that takes advantage of contrastive methods without requiring to compute pairwise comparisons.
Our method simultaneously clusters the data while enforcing consistency between cluster assignments.
Our method can be trained with large and small batches and can scale to unlimited amounts of data.
 arXiv  Detail & Related papers  (2020-06-17T14:00:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.