Model-Aware Contrastive Learning: Towards Escaping the Dilemmas
- URL: http://arxiv.org/abs/2207.07874v4
- Date: Sun, 11 Jun 2023 14:24:51 GMT
- Title: Model-Aware Contrastive Learning: Towards Escaping the Dilemmas
- Authors: Zizheng Huang, Haoxing Chen, Ziqi Wen, Chao Zhang, Huaxiong Li, Bo
Wang, Chunlin Chen
- Abstract summary: Contrastive learning (CL) continuously achieves significant breakthroughs across multiple domains.
InfoNCE-based methods suffer from some dilemmas, such as textituniformity-tolerance dilemma (UTD) and textitgradient reduction (UTD)
We present a Model-Aware Contrastive Learning (MACL) strategy, whose temperature is adaptive to the magnitude of alignment that reflects the basic confidence of the instance discrimination task.
- Score: 11.27589489269041
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contrastive learning (CL) continuously achieves significant breakthroughs
across multiple domains. However, the most common InfoNCE-based methods suffer
from some dilemmas, such as \textit{uniformity-tolerance dilemma} (UTD) and
\textit{gradient reduction}, both of which are related to a $\mathcal{P}_{ij}$
term. It has been identified that UTD can lead to unexpected performance
degradation. We argue that the fixity of temperature is to blame for UTD. To
tackle this challenge, we enrich the CL loss family by presenting a Model-Aware
Contrastive Learning (MACL) strategy, whose temperature is adaptive to the
magnitude of alignment that reflects the basic confidence of the instance
discrimination task, then enables CL loss to adjust the penalty strength for
hard negatives adaptively. Regarding another dilemma, the gradient reduction
issue, we derive the limits of an involved gradient scaling factor, which
allows us to explain from a unified perspective why some recent approaches are
effective with fewer negative samples, and summarily present a gradient
reweighting to escape this dilemma. Extensive remarkable empirical results in
vision, sentence, and graph modality validate our approach's general
improvement for representation learning and downstream tasks.
Related papers
- Multi-Granularity Semantic Revision for Large Language Model Distillation [66.03746866578274]
We propose a multi-granularity semantic revision method for LLM distillation.
At the sequence level, we propose a sequence correction and re-generation strategy.
At the token level, we design a distribution adaptive clipping Kullback-Leibler loss as the distillation objective function.
At the span level, we leverage the span priors of a sequence to compute the probability correlations within spans, and constrain the teacher and student's probability correlations to be consistent.
arXiv Detail & Related papers (2024-07-14T03:51:49Z) - Relaxed Contrastive Learning for Federated Learning [48.96253206661268]
We propose a novel contrastive learning framework to address the challenges of data heterogeneity in federated learning.
Our framework outperforms all existing federated learning approaches by huge margins on the standard benchmarks.
arXiv Detail & Related papers (2024-01-10T04:55:24Z) - Distortion-Disentangled Contrastive Learning [13.27998440853596]
We propose a novel POCL framework named Distortion-Disentangled Contrastive Learning (DDCL) and a Distortion-Disentangled Loss (DDL)
Our approach is the first to explicitly disentangle and exploit the DVR inside the model and feature stream to improve the overall representation utilization efficiency, robustness and representation ability.
arXiv Detail & Related papers (2023-03-09T06:33:31Z) - Stabilizing Off-Policy Deep Reinforcement Learning from Pixels [9.998078491879145]
Off-policy reinforcement learning from pixel observations is notoriously unstable.
We show that these instabilities arise from performing temporal-difference learning with a convolutional encoder and low-magnitude rewards.
We propose A-LIX, a method providing adaptive regularization to the encoder's gradients that explicitly prevents the occurrence of catastrophic self-overfitting.
arXiv Detail & Related papers (2022-07-03T08:52:40Z) - Beyond the Edge of Stability via Two-step Gradient Updates [49.03389279816152]
Gradient Descent (GD) is a powerful workhorse of modern machine learning.
GD's ability to find local minimisers is only guaranteed for losses with Lipschitz gradients.
This work focuses on simple, yet representative, learning problems via analysis of two-step gradient updates.
arXiv Detail & Related papers (2022-06-08T21:32:50Z) - Towards the Semantic Weak Generalization Problem in Generative Zero-Shot
Learning: Ante-hoc and Post-hoc [89.68803484284408]
We present a simple and effective strategy lowering the previously unexplored factors that limit the performance ceiling of generative Zero-Shot Learning (ZSL)
We begin by formally defining semantic generalization, then look into approaches for reducing the semantic weak generalization problem.
In the ante-hoc phase, we augment the generator's semantic input, as well as relax the fitting target of the generator.
arXiv Detail & Related papers (2022-04-24T13:54:42Z) - Bilevel learning of l1-regularizers with closed-form gradients(BLORC) [8.138650738423722]
We present a method for supervised learning of sparsity-promoting regularizers.
The parameters are learned to minimize the mean squared error of reconstruction on a training set of ground truth signal and measurement pairs.
arXiv Detail & Related papers (2021-11-21T17:01:29Z) - Unbiased Risk Estimators Can Mislead: A Case Study of Learning with
Complementary Labels [92.98756432746482]
We study a weakly supervised problem called learning with complementary labels.
We show that the quality of gradient estimation matters more in risk minimization.
We propose a novel surrogate complementary loss(SCL) framework that trades zero bias with reduced variance.
arXiv Detail & Related papers (2020-07-05T04:19:37Z) - Reparameterized Variational Divergence Minimization for Stable Imitation [57.06909373038396]
We study the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms.
We contribute a re parameterization trick for adversarial imitation learning to alleviate the challenges of the promising $f$-divergence minimization framework.
Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.
arXiv Detail & Related papers (2020-06-18T19:04:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.