Related papers: Understanding Contrastive Learning via Distributionally Robust Optimization

Understanding Contrastive Learning via Distributionally Robust Optimization

URL: http://arxiv.org/abs/2310.11048v1
Date: Tue, 17 Oct 2023 07:32:59 GMT
Title: Understanding Contrastive Learning via Distributionally Robust Optimization
Authors: Junkang Wu, Jiawei Chen, Jiancan Wu, Wentao Shi, Xiang Wang, Xiangnan He
Abstract summary: This study reveals the inherent tolerance of contrastive learning (CL) towards sampling bias, wherein negative samples may encompass similar semantics (eg labels) We bridge this research gap by analyzing CL through the lens of distributionally robust optimization (DRO), yielding several key insights. We also identify CL's potential shortcomings, including over-conservatism and sensitivity to outliers, and introduce a novel Adjusted InfoNCE loss (ADNCE) to mitigate these issues.
Score: 29.202594242468678
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This study reveals the inherent tolerance of contrastive learning (CL) towards sampling bias, wherein negative samples may encompass similar semantics (\eg labels). However, existing theories fall short in providing explanations for this phenomenon. We bridge this research gap by analyzing CL through the lens of distributionally robust optimization (DRO), yielding several key insights: (1) CL essentially conducts DRO over the negative sampling distribution, thus enabling robust performance across a variety of potential distributions and demonstrating robustness to sampling bias; (2) The design of the temperature $\tau$ is not merely heuristic but acts as a Lagrange Coefficient, regulating the size of the potential distribution set; (3) A theoretical connection is established between DRO and mutual information, thus presenting fresh evidence for ``InfoNCE as an estimate of MI'' and a new estimation approach for $\phi$-divergence-based generalized mutual information. We also identify CL's potential shortcomings, including over-conservatism and sensitivity to outliers, and introduce a novel Adjusted InfoNCE loss (ADNCE) to mitigate these issues. It refines potential distribution, improving performance and accelerating convergence. Extensive experiments on various domains (image, sentence, and graphs) validate the effectiveness of the proposal. The code is available at \url{https://github.com/junkangwu/ADNCE}.

Related papers

Statistical Inference for Conditional Group Distributionally Robust Optimization with Cross-Entropy Loss [9.054486124506521]
We study multi-source unsupervised domain adaptation, where labeled data are drawn from multiple source domains and only unlabeled data from a target domain.<n>We propose a novel Conditional Conditional Optimization (CG-DRO) framework that learns a classifier by minimizing the worst-case cross-entropy loss over the convex combinations of the conditional outcome distributions from the sources.<n>We establish fast statistical convergence rates for the estimator by constructing two surrogate minimax optimization problems that serve as theoretical bridges.
arXiv Detail & Related papers (2025-07-14T04:21:23Z)
Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis) [55.2480439325792]
This thesis is a series of independent contributions to statistics unified by a model-free perspective. The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning. The second chapter studies the concept of local independence, which describes whether the evolution of one process is directly influenced by another.
arXiv Detail & Related papers (2025-02-11T19:24:09Z)
A Semiparametric Approach to Causal Inference [2.092897805817524]
In causal inference, an important problem is to quantify the effects of interventions or treatments. In this paper, we employ a semiparametric density ratio model (DRM) to characterize the counterfactual distributions. Our model offers flexibility by avoiding strict parametric assumptions on the counterfactual distributions.
arXiv Detail & Related papers (2024-11-01T18:03:38Z)
Rectified Diffusion Guidance for Conditional Generation [62.00207951161297]
We revisit the theory behind CFG and rigorously confirm that the improper configuration of the combination coefficients (i.e., the widely used summing-to-one version) brings about expectation shift of the generative distribution. We propose ReCFG with a relaxation on the guidance coefficients such that denoising with ReCFG strictly aligns with the diffusion theory. That way the rectified coefficients can be readily pre-computed via traversing the observed data, leaving the sampling speed barely affected.
arXiv Detail & Related papers (2024-10-24T13:41:32Z)
Revisiting Essential and Nonessential Settings of Evidential Deep Learning [70.82728812001807]
Evidential Deep Learning (EDL) is an emerging method for uncertainty estimation. We propose Re-EDL, a simplified yet more effective variant of EDL.
arXiv Detail & Related papers (2024-10-01T04:27:07Z)
Theoretical Insights for Diffusion Guidance: A Case Study for Gaussian Mixture Models [59.331993845831946]
Diffusion models benefit from instillation of task-specific information into the score function to steer the sample generation towards desired properties. This paper provides the first theoretical study towards understanding the influence of guidance on diffusion models in the context of Gaussian mixture models.
arXiv Detail & Related papers (2024-03-03T23:15:48Z)
Boosted Control Functions: Distribution generalization and invariance in confounded models [10.503777692702952]
We introduce a strong notion of invariance that allows for distribution generalization even in the presence of nonlinear, non-identifiable structural functions. We propose the ControlTwicing algorithm to estimate the Boosted Control Function (BCF) using flexible machine-learning techniques.
arXiv Detail & Related papers (2023-10-09T15:43:46Z)
Distributional Shift-Aware Off-Policy Interval Estimation: A Unified Error Quantification Framework [8.572441599469597]
We study high-confidence off-policy evaluation in the context of infinite-horizon Markov decision processes. The objective is to establish a confidence interval (CI) for the target policy value using only offline data pre-collected from unknown behavior policies. We show that our algorithm is sample-efficient, error-robust, and provably convergent even in non-linear function approximation settings.
arXiv Detail & Related papers (2023-09-23T06:35:44Z)
Chasing Fairness Under Distribution Shift: A Model Weight Perturbation Approach [72.19525160912943]
We first theoretically demonstrate the inherent connection between distribution shift, data perturbation, and model weight perturbation. We then analyze the sufficient conditions to guarantee fairness for the target dataset. Motivated by these sufficient conditions, we propose robust fairness regularization (RFR)
arXiv Detail & Related papers (2023-03-06T17:19:23Z)
Variational Refinement for Importance Sampling Using the Forward Kullback-Leibler Divergence [77.06203118175335]
Variational Inference (VI) is a popular alternative to exact sampling in Bayesian inference. Importance sampling (IS) is often used to fine-tune and de-bias the estimates of approximate Bayesian inference procedures. We propose a novel combination of optimization and sampling techniques for approximate Bayesian inference.
arXiv Detail & Related papers (2021-06-30T11:00:24Z)
Achieving Efficiency in Black Box Simulation of Distribution Tails with Self-structuring Importance Samplers [1.6114012813668934]
The paper presents a novel Importance Sampling (IS) scheme for estimating distribution of performance measures modeled with a rich set of tools such as linear programs, integer linear programs, piecewise linear/quadratic objectives, feature maps specified with deep neural networks, etc.
arXiv Detail & Related papers (2021-02-14T03:37:22Z)
Stochastic-Sign SGD for Federated Learning with Theoretical Guarantees [49.91477656517431]
Quantization-based solvers have been widely adopted in Federated Learning (FL) No existing methods enjoy all the aforementioned properties. We propose an intuitively-simple yet theoretically-simple method based on SIGNSGD to bridge the gap.
arXiv Detail & Related papers (2020-02-25T15:12:15Z)
The Counterfactual $\chi$-GAN [20.42556178617068]
Causal inference often relies on the counterfactual framework, which requires that treatment assignment is independent of the outcome. This work proposes a generative adversarial network (GAN)-based model called the Counterfactual $chi$-GAN (cGAN)
arXiv Detail & Related papers (2020-01-09T17:23:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.