Understanding Contrastive Learning via Distributionally Robust
Optimization
- URL: http://arxiv.org/abs/2310.11048v1
- Date: Tue, 17 Oct 2023 07:32:59 GMT
- Title: Understanding Contrastive Learning via Distributionally Robust
Optimization
- Authors: Junkang Wu, Jiawei Chen, Jiancan Wu, Wentao Shi, Xiang Wang, Xiangnan
He
- Abstract summary: This study reveals the inherent tolerance of contrastive learning (CL) towards sampling bias, wherein negative samples may encompass similar semantics (eg labels)
We bridge this research gap by analyzing CL through the lens of distributionally robust optimization (DRO), yielding several key insights.
We also identify CL's potential shortcomings, including over-conservatism and sensitivity to outliers, and introduce a novel Adjusted InfoNCE loss (ADNCE) to mitigate these issues.
- Score: 29.202594242468678
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This study reveals the inherent tolerance of contrastive learning (CL)
towards sampling bias, wherein negative samples may encompass similar semantics
(\eg labels). However, existing theories fall short in providing explanations
for this phenomenon. We bridge this research gap by analyzing CL through the
lens of distributionally robust optimization (DRO), yielding several key
insights: (1) CL essentially conducts DRO over the negative sampling
distribution, thus enabling robust performance across a variety of potential
distributions and demonstrating robustness to sampling bias; (2) The design of
the temperature $\tau$ is not merely heuristic but acts as a Lagrange
Coefficient, regulating the size of the potential distribution set; (3) A
theoretical connection is established between DRO and mutual information, thus
presenting fresh evidence for ``InfoNCE as an estimate of MI'' and a new
estimation approach for $\phi$-divergence-based generalized mutual information.
We also identify CL's potential shortcomings, including over-conservatism and
sensitivity to outliers, and introduce a novel Adjusted InfoNCE loss (ADNCE) to
mitigate these issues. It refines potential distribution, improving performance
and accelerating convergence. Extensive experiments on various domains (image,
sentence, and graphs) validate the effectiveness of the proposal. The code is
available at \url{https://github.com/junkangwu/ADNCE}.
Related papers
- A Semiparametric Approach to Causal Inference [2.092897805817524]
In causal inference, an important problem is to quantify the effects of interventions or treatments.
In this paper, we employ a semiparametric density ratio model (DRM) to characterize the counterfactual distributions.
Our model offers flexibility by avoiding strict parametric assumptions on the counterfactual distributions.
arXiv Detail & Related papers (2024-11-01T18:03:38Z) - Rectified Diffusion Guidance for Conditional Generation [62.00207951161297]
We revisit the theory behind CFG and rigorously confirm that the improper configuration of the combination coefficients (i.e., the widely used summing-to-one version) brings about expectation shift of the generative distribution.
We propose ReCFG with a relaxation on the guidance coefficients such that denoising with ReCFG strictly aligns with the diffusion theory.
That way the rectified coefficients can be readily pre-computed via traversing the observed data, leaving the sampling speed barely affected.
arXiv Detail & Related papers (2024-10-24T13:41:32Z) - Revisiting Essential and Nonessential Settings of Evidential Deep Learning [70.82728812001807]
Evidential Deep Learning (EDL) is an emerging method for uncertainty estimation.
We propose Re-EDL, a simplified yet more effective variant of EDL.
arXiv Detail & Related papers (2024-10-01T04:27:07Z) - Theoretical Insights for Diffusion Guidance: A Case Study for Gaussian
Mixture Models [59.331993845831946]
Diffusion models benefit from instillation of task-specific information into the score function to steer the sample generation towards desired properties.
This paper provides the first theoretical study towards understanding the influence of guidance on diffusion models in the context of Gaussian mixture models.
arXiv Detail & Related papers (2024-03-03T23:15:48Z) - Distributional Shift-Aware Off-Policy Interval Estimation: A Unified
Error Quantification Framework [8.572441599469597]
We study high-confidence off-policy evaluation in the context of infinite-horizon Markov decision processes.
The objective is to establish a confidence interval (CI) for the target policy value using only offline data pre-collected from unknown behavior policies.
We show that our algorithm is sample-efficient, error-robust, and provably convergent even in non-linear function approximation settings.
arXiv Detail & Related papers (2023-09-23T06:35:44Z) - Chasing Fairness Under Distribution Shift: A Model Weight Perturbation
Approach [72.19525160912943]
We first theoretically demonstrate the inherent connection between distribution shift, data perturbation, and model weight perturbation.
We then analyze the sufficient conditions to guarantee fairness for the target dataset.
Motivated by these sufficient conditions, we propose robust fairness regularization (RFR)
arXiv Detail & Related papers (2023-03-06T17:19:23Z) - Variational Refinement for Importance Sampling Using the Forward
Kullback-Leibler Divergence [77.06203118175335]
Variational Inference (VI) is a popular alternative to exact sampling in Bayesian inference.
Importance sampling (IS) is often used to fine-tune and de-bias the estimates of approximate Bayesian inference procedures.
We propose a novel combination of optimization and sampling techniques for approximate Bayesian inference.
arXiv Detail & Related papers (2021-06-30T11:00:24Z) - Achieving Efficiency in Black Box Simulation of Distribution Tails with
Self-structuring Importance Samplers [1.6114012813668934]
The paper presents a novel Importance Sampling (IS) scheme for estimating distribution of performance measures modeled with a rich set of tools such as linear programs, integer linear programs, piecewise linear/quadratic objectives, feature maps specified with deep neural networks, etc.
arXiv Detail & Related papers (2021-02-14T03:37:22Z) - Stochastic-Sign SGD for Federated Learning with Theoretical Guarantees [49.91477656517431]
Quantization-based solvers have been widely adopted in Federated Learning (FL)
No existing methods enjoy all the aforementioned properties.
We propose an intuitively-simple yet theoretically-simple method based on SIGNSGD to bridge the gap.
arXiv Detail & Related papers (2020-02-25T15:12:15Z) - The Counterfactual $\chi$-GAN [20.42556178617068]
Causal inference often relies on the counterfactual framework, which requires that treatment assignment is independent of the outcome.
This work proposes a generative adversarial network (GAN)-based model called the Counterfactual $chi$-GAN (cGAN)
arXiv Detail & Related papers (2020-01-09T17:23:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.