Understanding Contrastive Learning via Distributionally Robust
Optimization
- URL: http://arxiv.org/abs/2310.11048v1
- Date: Tue, 17 Oct 2023 07:32:59 GMT
- Title: Understanding Contrastive Learning via Distributionally Robust
Optimization
- Authors: Junkang Wu, Jiawei Chen, Jiancan Wu, Wentao Shi, Xiang Wang, Xiangnan
He
- Abstract summary: This study reveals the inherent tolerance of contrastive learning (CL) towards sampling bias, wherein negative samples may encompass similar semantics (eg labels)
We bridge this research gap by analyzing CL through the lens of distributionally robust optimization (DRO), yielding several key insights.
We also identify CL's potential shortcomings, including over-conservatism and sensitivity to outliers, and introduce a novel Adjusted InfoNCE loss (ADNCE) to mitigate these issues.
- Score: 29.202594242468678
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This study reveals the inherent tolerance of contrastive learning (CL)
towards sampling bias, wherein negative samples may encompass similar semantics
(\eg labels). However, existing theories fall short in providing explanations
for this phenomenon. We bridge this research gap by analyzing CL through the
lens of distributionally robust optimization (DRO), yielding several key
insights: (1) CL essentially conducts DRO over the negative sampling
distribution, thus enabling robust performance across a variety of potential
distributions and demonstrating robustness to sampling bias; (2) The design of
the temperature $\tau$ is not merely heuristic but acts as a Lagrange
Coefficient, regulating the size of the potential distribution set; (3) A
theoretical connection is established between DRO and mutual information, thus
presenting fresh evidence for ``InfoNCE as an estimate of MI'' and a new
estimation approach for $\phi$-divergence-based generalized mutual information.
We also identify CL's potential shortcomings, including over-conservatism and
sensitivity to outliers, and introduce a novel Adjusted InfoNCE loss (ADNCE) to
mitigate these issues. It refines potential distribution, improving performance
and accelerating convergence. Extensive experiments on various domains (image,
sentence, and graphs) validate the effectiveness of the proposal. The code is
available at \url{https://github.com/junkangwu/ADNCE}.
Related papers
- Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis) [55.2480439325792]
This thesis is a series of independent contributions to statistics unified by a model-free perspective.
The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning.
The second chapter studies the concept of local independence, which describes whether the evolution of one process is directly influenced by another.
arXiv Detail & Related papers (2025-02-11T19:24:09Z) - A Semiparametric Approach to Causal Inference [2.092897805817524]
In causal inference, an important problem is to quantify the effects of interventions or treatments.
In this paper, we employ a semiparametric density ratio model (DRM) to characterize the counterfactual distributions.
Our model offers flexibility by avoiding strict parametric assumptions on the counterfactual distributions.
arXiv Detail & Related papers (2024-11-01T18:03:38Z) - Rectified Diffusion Guidance for Conditional Generation [62.00207951161297]
We revisit the theory behind CFG and rigorously confirm that the improper configuration of the combination coefficients (i.e., the widely used summing-to-one version) brings about expectation shift of the generative distribution.
We propose ReCFG with a relaxation on the guidance coefficients such that denoising with ReCFG strictly aligns with the diffusion theory.
That way the rectified coefficients can be readily pre-computed via traversing the observed data, leaving the sampling speed barely affected.
arXiv Detail & Related papers (2024-10-24T13:41:32Z) - Revisiting Essential and Nonessential Settings of Evidential Deep Learning [70.82728812001807]
Evidential Deep Learning (EDL) is an emerging method for uncertainty estimation.
We propose Re-EDL, a simplified yet more effective variant of EDL.
arXiv Detail & Related papers (2024-10-01T04:27:07Z) - Theoretical Insights for Diffusion Guidance: A Case Study for Gaussian
Mixture Models [59.331993845831946]
Diffusion models benefit from instillation of task-specific information into the score function to steer the sample generation towards desired properties.
This paper provides the first theoretical study towards understanding the influence of guidance on diffusion models in the context of Gaussian mixture models.
arXiv Detail & Related papers (2024-03-03T23:15:48Z) - Boosted Control Functions: Distribution generalization and invariance in confounded models [10.503777692702952]
We introduce a strong notion of invariance that allows for distribution generalization even in the presence of nonlinear, non-identifiable structural functions.
We propose the ControlTwicing algorithm to estimate the Boosted Control Function (BCF) using flexible machine-learning techniques.
arXiv Detail & Related papers (2023-10-09T15:43:46Z) - Variational Refinement for Importance Sampling Using the Forward
Kullback-Leibler Divergence [77.06203118175335]
Variational Inference (VI) is a popular alternative to exact sampling in Bayesian inference.
Importance sampling (IS) is often used to fine-tune and de-bias the estimates of approximate Bayesian inference procedures.
We propose a novel combination of optimization and sampling techniques for approximate Bayesian inference.
arXiv Detail & Related papers (2021-06-30T11:00:24Z) - Achieving Efficiency in Black Box Simulation of Distribution Tails with
Self-structuring Importance Samplers [1.6114012813668934]
The paper presents a novel Importance Sampling (IS) scheme for estimating distribution of performance measures modeled with a rich set of tools such as linear programs, integer linear programs, piecewise linear/quadratic objectives, feature maps specified with deep neural networks, etc.
arXiv Detail & Related papers (2021-02-14T03:37:22Z) - Stochastic-Sign SGD for Federated Learning with Theoretical Guarantees [49.91477656517431]
Quantization-based solvers have been widely adopted in Federated Learning (FL)
No existing methods enjoy all the aforementioned properties.
We propose an intuitively-simple yet theoretically-simple method based on SIGNSGD to bridge the gap.
arXiv Detail & Related papers (2020-02-25T15:12:15Z) - The Counterfactual $\chi$-GAN [20.42556178617068]
Causal inference often relies on the counterfactual framework, which requires that treatment assignment is independent of the outcome.
This work proposes a generative adversarial network (GAN)-based model called the Counterfactual $chi$-GAN (cGAN)
arXiv Detail & Related papers (2020-01-09T17:23:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.