Related papers: EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence

EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence

URL: http://arxiv.org/abs/2404.10575v1
Date: Tue, 16 Apr 2024 13:53:58 GMT
Title: EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence
Authors: Chung-Yiu Yau, Hoi-To Wai, Parameswaran Raman, Soumajyoti Sarkar, Mingyi Hong,
Abstract summary: A key challenge in contrastive learning is to generate negative samples from a large sample set to contrast with positive samples. We propose an Efficient Markov Chain Monte Carlo negative sampling method for Contrastive learning (EMC$2$) We prove that EMC$2$ is the first algorithm that exhibits global convergence (to stationarity) regardless of the choice of batch size.
Score: 43.96096434967746
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A key challenge in contrastive learning is to generate negative samples from a large sample set to contrast with positive samples, for learning better encoding of the data. These negative samples often follow a softmax distribution which are dynamically updated during the training process. However, sampling from this distribution is non-trivial due to the high computational costs in computing the partition function. In this paper, we propose an Efficient Markov Chain Monte Carlo negative sampling method for Contrastive learning (EMC$^2$). We follow the global contrastive learning loss as introduced in SogCLR, and propose EMC$^2$ which utilizes an adaptive Metropolis-Hastings subroutine to generate hardness-aware negative samples in an online fashion during the optimization. We prove that EMC$^2$ finds an $\mathcal{O}(1/\sqrt{T})$-stationary point of the global contrastive loss in $T$ iterations. Compared to prior works, EMC$^2$ is the first algorithm that exhibits global convergence (to stationarity) regardless of the choice of batch size while exhibiting low computation and memory cost. Numerical experiments validate that EMC$^2$ is effective with small batch training and achieves comparable or better performance than baseline algorithms. We report the results for pre-training image encoders on STL-10 and Imagenet-100.

Related papers

DISCO: Diversifying Sample Condensation for Efficient Model Evaluation [59.01400190971061]
Costly evaluation reduces inclusivity, slows the cycle of innovation, and worsens environmental impact.<n>We argue that promoting diversity among samples is not essential; what matters is to select samples thatmaximise diversity in model responses.<n>Our method, $textbfDiversifying Sample Condensation (DISCO)$, selects the top-k samples with the greatest model disagreements.
arXiv Detail & Related papers (2025-10-09T08:53:59Z)
Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators. Key to our solution is a novel projection technique based on ideas from harmonic analysis. Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z)
PLReMix: Combating Noisy Labels with Pseudo-Label Relaxed Contrastive Representation Learning [7.556169113399857]
We propose an end-to-end textbfPLReMix framework by introducing a Pseudo-Label Relaxed (PLR) contrastive loss. The proposed PLR loss is pluggable and we have integrated it into other LNL methods, observing their improved performance.
arXiv Detail & Related papers (2024-02-27T15:22:20Z)
Testable Learning with Distribution Shift [9.036777309376697]
We define a new model called testable learning with distribution shift. We obtain provably efficient algorithms for certifying the performance of a classifier on a test distribution. We give several positive results for learning concept classes such as halfspaces, intersections of halfspaces, and decision trees.
arXiv Detail & Related papers (2023-11-25T23:57:45Z)
Compressed Sensing: A Discrete Optimization Approach [5.877778007271621]
We present a semidefinite relaxation that strengthens the second order cone relaxation and develop a custom branch-and-bound algorithm. When used as a component of a multi-label classification algorithm, our approach achieves greater classification accuracy than benchmark compressed sensing methods.
arXiv Detail & Related papers (2023-06-05T01:29:24Z)
Approximate Function Evaluation via Multi-Armed Bandits [51.146684847667125]
We study the problem of estimating the value of a known smooth function $f$ at an unknown point $boldsymbolmu in mathbbRn$, where each component $mu_i$ can be sampled via a noisy oracle. We design an instance-adaptive algorithm that learns to sample according to the importance of each coordinate, and with probability at least $1-delta$ returns an $epsilon$ accurate estimate of $f(boldsymbolmu)$.
arXiv Detail & Related papers (2022-03-18T18:50:52Z)
Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning [52.76230802067506]
A novel model-free algorithm is proposed to minimize regret in episodic reinforcement learning. The proposed algorithm employs an em early-settled reference update rule, with the aid of two Q-learning sequences. The design principle of our early-settled variance reduction method might be of independent interest to other RL settings.
arXiv Detail & Related papers (2021-10-09T21:13:48Z)
Tightening the Dependence on Horizon in the Sample Complexity of Q-Learning [59.71676469100807]
This work sharpens the sample complexity of synchronous Q-learning to an order of $frac|mathcalS|| (1-gamma)4varepsilon2$ for any $0varepsilon 1$. Our finding unveils the effectiveness of vanilla Q-learning, which matches that of speedy Q-learning without requiring extra computation and storage.
arXiv Detail & Related papers (2021-02-12T14:22:05Z)
SCE: Scalable Network Embedding from Sparsest Cut [20.08464038805681]
Large-scale network embedding is to learn a latent representation for each node in an unsupervised manner. A key of success to such contrastive learning methods is how to draw positive and negative samples. In this paper, we propose SCE for unsupervised network embedding only using negative samples for training.
arXiv Detail & Related papers (2020-06-30T03:18:15Z)
Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP) We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.