Mini-Batch Optimization of Contrastive Loss
- URL: http://arxiv.org/abs/2307.05906v1
- Date: Wed, 12 Jul 2023 04:23:26 GMT
- Title: Mini-Batch Optimization of Contrastive Loss
- Authors: Jaewoong Cho, Kartik Sreenivasan, Keon Lee, Kyunghoo Mun, Soheun Yi,
Jeong-Gwan Lee, Anna Lee, Jy-yong Sohn, Dimitris Papailiopoulos, Kangwook Lee
- Abstract summary: We show that mini-batch optimization is equivalent to full-batch optimization if and only if all $binomNB$ mini-batches are selected.
We then demonstrate that utilizing high-loss mini-batches can speed up SGD convergence and propose a spectral clustering-based approach for identifying these high-loss mini-batches.
- Score: 13.730030395850358
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contrastive learning has gained significant attention as a method for
self-supervised learning. The contrastive loss function ensures that embeddings
of positive sample pairs (e.g., different samples from the same class or
different views of the same object) are similar, while embeddings of negative
pairs are dissimilar. Practical constraints such as large memory requirements
make it challenging to consider all possible positive and negative pairs,
leading to the use of mini-batch optimization. In this paper, we investigate
the theoretical aspects of mini-batch optimization in contrastive learning. We
show that mini-batch optimization is equivalent to full-batch optimization if
and only if all $\binom{N}{B}$ mini-batches are selected, while sub-optimality
may arise when examining only a subset. We then demonstrate that utilizing
high-loss mini-batches can speed up SGD convergence and propose a spectral
clustering-based approach for identifying these high-loss mini-batches. Our
experimental results validate our theoretical findings and demonstrate that our
proposed algorithm outperforms vanilla SGD in practically relevant settings,
providing a better understanding of mini-batch optimization in contrastive
learning.
Related papers
- Transductive Zero-Shot and Few-Shot CLIP [24.592841797020203]
This paper addresses the transductive zero-shot and few-shot CLIP classification challenge.
Inference is performed jointly across a mini-batch of unlabeled query samples, rather than treating each instance independently.
Our approach yields near 20% improvement in ImageNet accuracy over CLIP's zero-shot performance.
arXiv Detail & Related papers (2024-04-08T12:44:31Z) - TS-RSR: A provably efficient approach for batch bayesian optimization [4.622871908358325]
This paper presents a new approach for batch Bayesian Optimization (BO) called Thompson Sampling-Regret to Sigma Ratio directed sampling.
Our sampling objective is able to coordinate the actions chosen in each batch in a way that minimizes redundancy between points.
We demonstrate that our method attains state-of-the-art performance on a range of challenging synthetic and realistic test functions.
arXiv Detail & Related papers (2024-03-07T18:58:26Z) - Faster Stochastic Variance Reduction Methods for Compositional MiniMax
Optimization [50.10952609321302]
compositional minimax optimization is a pivotal challenge across various machine learning domains.
Current methods of compositional minimax optimization are plagued by sub-optimal complexities or heavy reliance on sizable batch sizes.
This paper introduces a novel method, called Nested STOchastic Recursive Momentum (NSTORM), which can achieve the optimal sample complexity of $O(kappa3 /epsilon3 )$.
arXiv Detail & Related papers (2023-08-18T14:57:21Z) - Adaptive Federated Minimax Optimization with Lower Complexities [82.51223883622552]
We propose an efficient adaptive minimax optimization algorithm (i.e., AdaFGDA) to solve these minimax problems.
It builds our momentum-based reduced and localSGD techniques, and it flexibly incorporate various adaptive learning rates.
arXiv Detail & Related papers (2022-11-14T12:32:18Z) - Asymptotically Unbiased Instance-wise Regularized Partial AUC
Optimization: Theory and Algorithm [101.44676036551537]
One-way Partial AUC (OPAUC) and Two-way Partial AUC (TPAUC) measures the average performance of a binary classifier.
Most of the existing methods could only optimize PAUC approximately, leading to inevitable biases that are not controllable.
We present a simpler reformulation of the PAUC problem via distributional robust optimization AUC.
arXiv Detail & Related papers (2022-10-08T08:26:22Z) - Provable Stochastic Optimization for Global Contrastive Learning: Small
Batch Does Not Harm Performance [53.49803579981569]
We consider a global objective for contrastive learning, which contrasts each positive pair with all negative pairs for an anchor point.
Existing methods such as SimCLR requires a large batch size in order to achieve a satisfactory result.
We propose a memory-efficient optimization algorithm for solving the Global Contrastive Learning of Representations, named SogCLR.
arXiv Detail & Related papers (2022-02-24T22:16:53Z) - Max-Margin Contrastive Learning [120.32963353348674]
We present max-margin contrastive learning (MMCL) for unsupervised representation learning.
Our approach selects negatives as the sparse support vectors obtained via a quadratic optimization problem.
We validate our approach on standard vision benchmark datasets, demonstrating better performance in unsupervised representation learning.
arXiv Detail & Related papers (2021-12-21T18:56:54Z) - Learning with Multiclass AUC: Theory and Algorithms [141.63211412386283]
Area under the ROC curve (AUC) is a well-known ranking metric for problems such as imbalanced learning and recommender systems.
In this paper, we start an early trial to consider the problem of learning multiclass scoring functions via optimizing multiclass AUC metrics.
arXiv Detail & Related papers (2021-07-28T05:18:10Z) - Amortized variance reduction for doubly stochastic objectives [17.064916635597417]
Approximate inference in complex probabilistic models requires optimisation of doubly objective functions.
Current approaches do not take into account how mini-batchity affects samplingity, resulting in sub-optimal variance reduction.
We propose a new approach in which we use a recognition network to cheaply approximate the optimal control variate for each mini-batch, with no additional gradient computations.
arXiv Detail & Related papers (2020-03-09T13:23:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.