Tight Mutual Information Estimation With Contrastive Fenchel-Legendre
Optimization
- URL: http://arxiv.org/abs/2107.01131v1
- Date: Fri, 2 Jul 2021 15:20:41 GMT
- Title: Tight Mutual Information Estimation With Contrastive Fenchel-Legendre
Optimization
- Authors: Qing Guo, Junya Chen, Dong Wang, Yuewei Yang, Xinwei Deng, Lawrence
Carin, Fan Li, Chenyang Tao
- Abstract summary: We introduce a novel, simple, and powerful contrastive MI estimator named as FLO.
Empirically, our FLO estimator overcomes the limitations of its predecessors and learns more efficiently.
The utility of FLO is verified using an extensive set of benchmarks, which also reveals the trade-offs in practical MI estimation.
- Score: 69.07420650261649
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Successful applications of InfoNCE and its variants have popularized the use
of contrastive variational mutual information (MI) estimators in machine
learning. While featuring superior stability, these estimators crucially depend
on costly large-batch training, and they sacrifice bound tightness for variance
reduction. To overcome these limitations, we revisit the mathematics of popular
variational MI bounds from the lens of unnormalized statistical modeling and
convex optimization. Our investigation not only yields a new unified
theoretical framework encompassing popular variational MI bounds but also leads
to a novel, simple, and powerful contrastive MI estimator named as FLO.
Theoretically, we show that the FLO estimator is tight, and it provably
converges under stochastic gradient descent. Empirically, our FLO estimator
overcomes the limitations of its predecessors and learns more efficiently. The
utility of FLO is verified using an extensive set of benchmarks, which also
reveals the trade-offs in practical MI estimation.
Related papers
- Measuring Variable Importance in Individual Treatment Effect Estimation with High Dimensional Data [35.104681814241104]
Causal machine learning (ML) promises to provide powerful tools for estimating individual treatment effects.
ML methods still face the significant challenge of interpretability, which is crucial for medical applications.
We propose a new algorithm based on the Conditional Permutation Importance (CPI) method for statistically rigorous variable importance assessment.
arXiv Detail & Related papers (2024-08-23T11:44:07Z) - On the Convergence of Zeroth-Order Federated Tuning for Large Language Models [36.277423093218275]
Federated Learning and Large Language Models (LLMs) are ushering in a new era in privacy-preserving natural language processing.
Memory-efficient Zeroth-Order Optimization is a synergy we term as FedMeZO.
Our study is the first to examine the theoretical underpinnings of FedMeZO in the context of LLMs.
arXiv Detail & Related papers (2024-02-08T18:56:40Z) - f-FERM: A Scalable Framework for Robust Fair Empirical Risk Minimization [9.591164070876689]
This paper presents a unified optimization framework for fair empirical risk based on f-divergence measures (f-FERM)
In addition, our experiments demonstrate the superiority of fairness-accuracy tradeoffs offered by f-FERM for almost all batch sizes.
Our extension is based on a distributionally robust optimization reformulation of f-FERM objective under $L_p$ norms as uncertainty sets.
arXiv Detail & Related papers (2023-12-06T03:14:16Z) - Personalized Federated Learning under Mixture of Distributions [98.25444470990107]
We propose a novel approach to Personalized Federated Learning (PFL), which utilizes Gaussian mixture models (GMM) to fit the input data distributions across diverse clients.
FedGMM possesses an additional advantage of adapting to new clients with minimal overhead, and it also enables uncertainty quantification.
Empirical evaluations on synthetic and benchmark datasets demonstrate the superior performance of our method in both PFL classification and novel sample detection.
arXiv Detail & Related papers (2023-05-01T20:04:46Z) - Mutual Wasserstein Discrepancy Minimization for Sequential
Recommendation [82.0801585843835]
We propose a novel self-supervised learning framework based on Mutual WasserStein discrepancy minimization MStein for the sequential recommendation.
We also propose a novel contrastive learning loss based on Wasserstein Discrepancy Measurement.
arXiv Detail & Related papers (2023-01-28T13:38:48Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - On Tilted Losses in Machine Learning: Theory and Applications [26.87656095874882]
Exponential tilting is a technique commonly used in fields such as statistics, probability, information theory, and optimization.
We study a simple extension to ERM, which uses exponential tilting to flexibly tune the impact of individual losses.
We find that the framework can consistently outperform ERM and deliver competitive performance with state-of-the-art, problem-specific approaches.
arXiv Detail & Related papers (2021-09-13T17:33:42Z) - Learning with Multiclass AUC: Theory and Algorithms [141.63211412386283]
Area under the ROC curve (AUC) is a well-known ranking metric for problems such as imbalanced learning and recommender systems.
In this paper, we start an early trial to consider the problem of learning multiclass scoring functions via optimizing multiclass AUC metrics.
arXiv Detail & Related papers (2021-07-28T05:18:10Z) - CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information [105.73798100327667]
We propose a novel Contrastive Log-ratio Upper Bound (CLUB) of mutual information.
We provide a theoretical analysis of the properties of CLUB and its variational approximation.
Based on this upper bound, we introduce a MI minimization training scheme and further accelerate it with a negative sampling strategy.
arXiv Detail & Related papers (2020-06-22T05:36:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.