Decoupled Contrastive Learning
- URL: http://arxiv.org/abs/2110.06848v1
- Date: Wed, 13 Oct 2021 16:38:43 GMT
- Title: Decoupled Contrastive Learning
- Authors: Chun-Hsiao Yeh, Cheng-Yao Hong, Yen-Chi Hsu, Tyng-Luh Liu, Yubei Chen
and Yann LeCun
- Abstract summary: We identify a noticeable negative-positive-coupling (NPC) effect in the widely used cross-entropy (InfoNCE) loss.
By properly addressing the NPC effect, we reach a decoupled contrastive learning (DCL) objective function.
Our approach achieves $66.9%$ ImageNet top-1 accuracy using batch size 256 within 200 epochs pre-training, outperforming its baseline SimCLR by $5.1%$.
- Score: 23.25775900388382
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contrastive learning (CL) is one of the most successful paradigms for
self-supervised learning (SSL). In a principled way, it considers two augmented
``views'' of the same image as positive to be pulled closer, and all other
images negative to be pushed further apart. However, behind the impressive
success of CL-based techniques, their formulation often relies on
heavy-computation settings, including large sample batches, extensive training
epochs, etc. We are thus motivated to tackle these issues and aim at
establishing a simple, efficient, and yet competitive baseline of contrastive
learning. Specifically, we identify, from theoretical and empirical studies, a
noticeable negative-positive-coupling (NPC) effect in the widely used
cross-entropy (InfoNCE) loss, leading to unsuitable learning efficiency with
respect to the batch size. Indeed the phenomenon tends to be neglected in that
optimizing infoNCE loss with a small-size batch is effective in solving easier
SSL tasks. By properly addressing the NPC effect, we reach a decoupled
contrastive learning (DCL) objective function, significantly improving SSL
efficiency. DCL can achieve competitive performance, requiring neither large
batches in SimCLR, momentum encoding in MoCo, or large epochs. We demonstrate
the usefulness of DCL in various benchmarks, while manifesting its robustness
being much less sensitive to suboptimal hyperparameters. Notably, our approach
achieves $66.9\%$ ImageNet top-1 accuracy using batch size 256 within 200
epochs pre-training, outperforming its baseline SimCLR by $5.1\%$. With further
optimized hyperparameters, DCL can improve the accuracy to $68.2\%$. We believe
DCL provides a valuable baseline for future contrastive learning-based SSL
studies.
Related papers
- L^2CL: Embarrassingly Simple Layer-to-Layer Contrastive Learning for Graph Collaborative Filtering [33.165094795515785]
Graph neural networks (GNNs) have recently emerged as an effective approach to model neighborhood signals in collaborative filtering.
We propose L2CL, a principled Layer-to-Layer Contrastive Learning framework that contrasts representations from different layers.
We find that L2CL, using only one-hop contrastive learning paradigm, is able to capture intrinsic semantic structures and improve the quality of node representation.
arXiv Detail & Related papers (2024-07-19T12:45:21Z) - Decoupled Contrastive Learning for Long-Tailed Recognition [58.255966442426484]
Supervised Contrastive Loss (SCL) is popular in visual representation learning.
In the scenario of long-tailed recognition, where the number of samples in each class is imbalanced, treating two types of positive samples equally leads to the biased optimization for intra-category distance.
We propose a patch-based self distillation to transfer knowledge from head to tail classes to relieve the under-representation of tail classes.
arXiv Detail & Related papers (2024-03-10T09:46:28Z) - RecDCL: Dual Contrastive Learning for Recommendation [65.6236784430981]
We propose a dual contrastive learning recommendation framework -- RecDCL.
In RecDCL, the FCL objective is designed to eliminate redundant solutions on user-item positive pairs.
The BCL objective is utilized to generate contrastive embeddings on output vectors for enhancing the robustness of the representations.
arXiv Detail & Related papers (2024-01-28T11:51:09Z) - Unifying Synergies between Self-supervised Learning and Dynamic
Computation [53.66628188936682]
We present a novel perspective on the interplay between SSL and DC paradigms.
We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting.
The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-01-22T17:12:58Z) - Supervised Contrastive Learning as Multi-Objective Optimization for
Fine-Tuning Large Pre-trained Language Models [3.759936323189417]
Supervised Contrastive Learning (SCL) has been shown to achieve excellent performance in most classification tasks.
In this work, we formulate the SCL problem as a Multi-Objective Optimization problem for the fine-tuning phase of RoBERTa language model.
arXiv Detail & Related papers (2022-09-28T15:13:58Z) - Decoupled Adversarial Contrastive Learning for Self-supervised
Adversarial Robustness [69.39073806630583]
Adversarial training (AT) for robust representation learning and self-supervised learning (SSL) for unsupervised representation learning are two active research fields.
We propose a two-stage framework termed Decoupled Adversarial Contrastive Learning (DeACL)
arXiv Detail & Related papers (2022-07-22T06:30:44Z) - Provable Stochastic Optimization for Global Contrastive Learning: Small
Batch Does Not Harm Performance [53.49803579981569]
We consider a global objective for contrastive learning, which contrasts each positive pair with all negative pairs for an anchor point.
Existing methods such as SimCLR requires a large batch size in order to achieve a satisfactory result.
We propose a memory-efficient optimization algorithm for solving the Global Contrastive Learning of Representations, named SogCLR.
arXiv Detail & Related papers (2022-02-24T22:16:53Z) - Semi-supervised Contrastive Learning with Similarity Co-calibration [72.38187308270135]
We propose a novel training strategy, termed as Semi-supervised Contrastive Learning (SsCL)
SsCL combines the well-known contrastive loss in self-supervised learning with the cross entropy loss in semi-supervised learning.
We show that SsCL produces more discriminative representation and is beneficial to few shot learning.
arXiv Detail & Related papers (2021-05-16T09:13:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.