Related papers: Exploring the Equivalence of Siamese Self-Supervised Learning via A Unified Gradient Framework

Exploring the Equivalence of Siamese Self-Supervised Learning via A Unified Gradient Framework

URL: http://arxiv.org/abs/2112.05141v1
Date: Thu, 9 Dec 2021 18:59:57 GMT
Title: Exploring the Equivalence of Siamese Self-Supervised Learning via A Unified Gradient Framework
Authors: Chenxin Tao, Honghui Wang, Xizhou Zhu, Jiahua Dong, Shiji Song, Gao Huang, Jifeng Dai
Abstract summary: Self-supervised learning has shown its great potential to extract powerful visual representations without human annotations. Various works are proposed to deal with self-supervised learning from different perspectives. We propose UniGrad, a simple but effective gradient form for self-supervised learning.
Score: 43.76337849044254
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Self-supervised learning has shown its great potential to extract powerful visual representations without human annotations. Various works are proposed to deal with self-supervised learning from different perspectives: (1) contrastive learning methods (e.g., MoCo, SimCLR) utilize both positive and negative samples to guide the training direction; (2) asymmetric network methods (e.g., BYOL, SimSiam) get rid of negative samples via the introduction of a predictor network and the stop-gradient operation; (3) feature decorrelation methods (e.g., Barlow Twins, VICReg) instead aim to reduce the redundancy between feature dimensions. These methods appear to be quite different in the designed loss functions from various motivations. The final accuracy numbers also vary, where different networks and tricks are utilized in different works. In this work, we demonstrate that these methods can be unified into the same form. Instead of comparing their loss functions, we derive a unified formula through gradient analysis. Furthermore, we conduct fair and detailed experiments to compare their performances. It turns out that there is little gap between these methods, and the use of momentum encoder is the key factor to boost performance. From this unified framework, we propose UniGrad, a simple but effective gradient form for self-supervised learning. It does not require a memory bank or a predictor network, but can still achieve state-of-the-art performance and easily adopt other training strategies. Extensive experiments on linear evaluation and many downstream tasks also show its effectiveness. Code shall be released.

Related papers

Value from Observations: Towards Large-Scale Imitation Learning via Self-Improvement [19.883973457999282]
Imitation Learning from Observation (IfO) offers a powerful way to learn behaviors at large-scale.<n>This paper investigates idealized scenarios with mostly bimodal-quality data distributions and introduces a method to learn from such data.<n>Our method adapts RL-based imitation learning to action-free demonstrations, using a value function to transfer information between expert and non-expert data.
arXiv Detail & Related papers (2025-07-09T09:55:23Z)
Implicit Contrastive Representation Learning with Guided Stop-gradient [0.0]
We introduce a methodology to implicitly incorporate the idea of contrastive learning. We show that our method stabilizes training and boosts performance. The algorithms with our method work well with small batch sizes and do not collapse even when there is no predictor.
arXiv Detail & Related papers (2025-03-12T04:46:53Z)
CUCL: Codebook for Unsupervised Continual Learning [129.91731617718781]
The focus of this study is on Unsupervised Continual Learning (UCL), as it presents an alternative to Supervised Continual Learning. We propose a method named Codebook for Unsupervised Continual Learning (CUCL) which promotes the model to learn discriminative features to complete the class boundary. Our method significantly boosts the performances of supervised and unsupervised methods.
arXiv Detail & Related papers (2023-11-25T03:08:50Z)
RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning. Our proposed method uses reinforcement learning with user intervention signals themselves as rewards. This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z)
From Pretext to Purpose: Batch-Adaptive Self-Supervised Learning [32.18543787821028]
This paper proposes an adaptive technique of batch fusion for self-supervised contrastive learning. It achieves state-of-the-art performance under equitable comparisons. We suggest that the proposed method may contribute to the advancement of data-driven self-supervised learning research.
arXiv Detail & Related papers (2023-11-16T15:47:49Z)
A Study of Forward-Forward Algorithm for Self-Supervised Learning [65.268245109828]
We study the performance of forward-forward vs. backpropagation for self-supervised representation learning. Our main finding is that while the forward-forward algorithm performs comparably to backpropagation during (self-supervised) training, the transfer performance is significantly lagging behind in all the studied settings.
arXiv Detail & Related papers (2023-09-21T10:14:53Z)
MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks. We propose a single-stage and standalone method, MOCA, which unifies both desired properties. We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z)
MSVQ: Self-Supervised Learning with Multiple Sample Views and Queues [10.327408694770709]
We propose a new simple framework, namely Multiple Sample Views and Queues (MSVQ) We jointly construct three soft labels on-the-fly by utilizing two complementary and symmetric approaches. Let the student network mimic the similarity relationships between the samples, thus giving the student network a more flexible ability to identify false negative samples in the dataset.
arXiv Detail & Related papers (2023-05-09T12:05:14Z)
Siamese Prototypical Contrastive Learning [24.794022951873156]
Contrastive Self-supervised Learning (CSL) is a practical solution that learns meaningful visual representations from massive data in an unsupervised approach. In this paper, we tackle this problem by introducing a simple but effective contrastive learning framework. The key insight is to employ siamese-style metric loss to match intra-prototype features, while increasing the distance between inter-prototype features.
arXiv Detail & Related papers (2022-08-18T13:25:30Z)
Relational Surrogate Loss Learning [41.61184221367546]
This paper revisits the surrogate loss learning, where a deep neural network is employed to approximate the evaluation metrics. In this paper, we show that directly maintaining the relation of models between surrogate losses and metrics suffices. Our method is much easier to optimize and enjoys significant efficiency and performance gains.
arXiv Detail & Related papers (2022-02-26T17:32:57Z)
A Survey on Contrastive Self-supervised Learning [0.0]
Self-supervised learning has gained popularity because of its ability to avoid the cost of annotating large-scale datasets. Contrastive learning has recently become a dominant component in self-supervised learning methods for computer vision, natural language processing (NLP), and other domains. This paper provides an extensive review of self-supervised methods that follow the contrastive approach.
arXiv Detail & Related papers (2020-10-31T21:05:04Z)
Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks. We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task. Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.