Global Contrastive Batch Sampling via Optimization on Sample
Permutations
- URL: http://arxiv.org/abs/2210.12874v4
- Date: Wed, 7 Jun 2023 04:38:19 GMT
- Title: Global Contrastive Batch Sampling via Optimization on Sample
Permutations
- Authors: Vin Sachidananda, Ziyi Yang, Chenguang Zhu
- Abstract summary: Global Contrastive Batch Sampling (GCBS) is an efficient approximation to the batch assignment problem.
GCBS improves state-of-the-art performance in sentence embedding and code-search tasks.
- Score: 28.72288652451881
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Contrastive Learning has recently achieved state-of-the-art performance in a
wide range of tasks. Many contrastive learning approaches use mined hard
negatives to make batches more informative during training but these approaches
are inefficient as they increase epoch length proportional to the number of
mined negatives and require frequent updates of nearest neighbor indices or
mining from recent batches. In this work, we provide an alternative to hard
negative mining, Global Contrastive Batch Sampling (GCBS), an efficient
approximation to the batch assignment problem that upper bounds the gap between
the global and training losses, $\mathcal{L}^{Global} - \mathcal{L}^{Train}$,
in contrastive learning settings. Through experimentation we find GCBS improves
state-of-the-art performance in sentence embedding and code-search tasks.
Additionally, GCBS is easy to implement as it requires only a few additional
lines of code, does not maintain external data structures such as nearest
neighbor indices, is more computationally efficient than the most minimal hard
negative mining approaches, and makes no changes to the model being trained.
Related papers
- BatchGFN: Generative Flow Networks for Batch Active Learning [80.73649229919454]
BatchGFN is a novel approach for pool-based active learning that uses generative flow networks to sample sets of data points proportional to a batch reward.
We show our approach enables principled sampling near-optimal utility batches at inference time with a single forward pass per point in the batch in toy regression problems.
arXiv Detail & Related papers (2023-06-26T20:41:36Z) - Neighborhood-based Hard Negative Mining for Sequential Recommendation [14.66576013324401]
Negative sampling plays a crucial role in training successful sequential recommendation models.
We propose a Graph-based Negative sampling approach based on Neighborhood Overlap (GNNO) to exploit structural information hidden in user behaviors for negative mining.
arXiv Detail & Related papers (2023-06-12T12:28:54Z) - Intra-class Adaptive Augmentation with Neighbor Correction for Deep
Metric Learning [99.14132861655223]
We propose a novel intra-class adaptive augmentation (IAA) framework for deep metric learning.
We reasonably estimate intra-class variations for every class and generate adaptive synthetic samples to support hard samples mining.
Our method significantly improves and outperforms the state-of-the-art methods on retrieval performances by 3%-6%.
arXiv Detail & Related papers (2022-11-29T14:52:38Z) - A Maximum Log-Likelihood Method for Imbalanced Few-Shot Learning Tasks [3.2895195535353308]
We propose a new maximum log-likelihood metric for few-shot architectures.
We demonstrate that the proposed metric achieves superior performance accuracy w.r.t. conventional similarity metrics.
We also show that our algorithm achieves state-of-the-art transductive few-shot performance when the evaluation data is imbalanced.
arXiv Detail & Related papers (2022-11-26T21:31:00Z) - LoOp: Looking for Optimal Hard Negative Embeddings for Deep Metric
Learning [17.571160136568455]
We propose a novel approach that looks for optimal hard negatives (LoOp) in the embedding space.
Unlike mining-based methods, our approach considers the entire space between pairs of embeddings to calculate the optimal hard negatives.
arXiv Detail & Related papers (2021-08-20T19:21:33Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - Graph Sampling Based Deep Metric Learning for Generalizable Person
Re-Identification [114.56752624945142]
We argue that the most popular random sampling method, the well-known PK sampler, is not informative and efficient for deep metric learning.
We propose an efficient mini batch sampling method called Graph Sampling (GS) for large-scale metric learning.
arXiv Detail & Related papers (2021-04-04T06:44:15Z) - Communication-Efficient Sampling for Distributed Training of Graph
Convolutional Networks [3.075766050800645]
Training Graph Convolutional Networks (GCNs) is expensive as it needs to aggregate data from neighboring nodes.
Previous works have proposed various neighbor sampling methods that estimate the aggregation result based on a small number of sampled neighbors.
We present an algorithm that determines the local sampling probabilities and makes sure our skewed neighbor sampling does not affect much the convergence of the training.
arXiv Detail & Related papers (2021-01-19T16:12:44Z) - Bandit Samplers for Training Graph Neural Networks [63.17765191700203]
Several sampling algorithms with variance reduction have been proposed for accelerating the training of Graph Convolution Networks (GCNs)
These sampling algorithms are not applicable to more general graph neural networks (GNNs) where the message aggregator contains learned weights rather than fixed weights, such as Graph Attention Networks (GAT)
arXiv Detail & Related papers (2020-06-10T12:48:37Z) - Top-k Training of GANs: Improving GAN Performance by Throwing Away Bad
Samples [67.11669996924671]
We introduce a simple (one line of code) modification to the Generative Adversarial Network (GAN) training algorithm.
When updating the generator parameters, we zero out the gradient contributions from the elements of the batch that the critic scores as least realistic'
We show that this top-k update' procedure is a generally applicable improvement.
arXiv Detail & Related papers (2020-02-14T19:27:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.