Related papers: Tight Mutual Information Estimation With Contrastive Fenchel-Legendre Optimization

Tight Mutual Information Estimation With Contrastive Fenchel-Legendre Optimization

URL: http://arxiv.org/abs/2107.01131v1
Date: Fri, 2 Jul 2021 15:20:41 GMT
Title: Tight Mutual Information Estimation With Contrastive Fenchel-Legendre Optimization
Authors: Qing Guo, Junya Chen, Dong Wang, Yuewei Yang, Xinwei Deng, Lawrence Carin, Fan Li, Chenyang Tao
Abstract summary: We introduce a novel, simple, and powerful contrastive MI estimator named as FLO. Empirically, our FLO estimator overcomes the limitations of its predecessors and learns more efficiently. The utility of FLO is verified using an extensive set of benchmarks, which also reveals the trade-offs in practical MI estimation.
Score: 69.07420650261649
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Successful applications of InfoNCE and its variants have popularized the use of contrastive variational mutual information (MI) estimators in machine learning. While featuring superior stability, these estimators crucially depend on costly large-batch training, and they sacrifice bound tightness for variance reduction. To overcome these limitations, we revisit the mathematics of popular variational MI bounds from the lens of unnormalized statistical modeling and convex optimization. Our investigation not only yields a new unified theoretical framework encompassing popular variational MI bounds but also leads to a novel, simple, and powerful contrastive MI estimator named as FLO. Theoretically, we show that the FLO estimator is tight, and it provably converges under stochastic gradient descent. Empirically, our FLO estimator overcomes the limitations of its predecessors and learns more efficiently. The utility of FLO is verified using an extensive set of benchmarks, which also reveals the trade-offs in practical MI estimation.

Related papers

Supervised Optimism Correction: Be Confident When LLMs Are Sure [91.7459076316849]
We establish a novel theoretical connection between supervised fine-tuning and offline reinforcement learning. We show that the widely used beam search method suffers from unacceptable over-optimism. We propose Supervised Optimism Correction, which introduces a simple yet effective auxiliary loss for token-level $Q$-value estimations.
arXiv Detail & Related papers (2025-04-10T07:50:03Z)
A Deep Bayesian Nonparametric Framework for Robust Mutual Information Estimation [9.68824512279232]
Mutual Information (MI) is a crucial measure for capturing dependencies between variables. We present a solution for training an MI estimator by constructing the MI loss with a finite representation of the Dirichlet process posterior to incorporate regularization. We explore the application of our estimator in maximizing MI between the data space and the latent space of a variational autoencoder.
arXiv Detail & Related papers (2025-03-11T21:27:48Z)
Measuring Variable Importance in Individual Treatment Effect Estimation with High Dimensional Data [35.104681814241104]
Causal machine learning (ML) promises to provide powerful tools for estimating individual treatment effects. ML methods still face the significant challenge of interpretability, which is crucial for medical applications. We propose a new algorithm based on the Conditional Permutation Importance (CPI) method for statistically rigorous variable importance assessment.
arXiv Detail & Related papers (2024-08-23T11:44:07Z)
On the Convergence of Zeroth-Order Federated Tuning for Large Language Models [36.277423093218275]
Federated Learning and Large Language Models (LLMs) are ushering in a new era in privacy-preserving natural language processing. Memory-efficient Zeroth-Order Optimization is a synergy we term as FedMeZO. Our study is the first to examine the theoretical underpinnings of FedMeZO in the context of LLMs.
arXiv Detail & Related papers (2024-02-08T18:56:40Z)
f-FERM: A Scalable Framework for Robust Fair Empirical Risk Minimization [9.591164070876689]
This paper presents a unified optimization framework for fair empirical risk based on f-divergence measures (f-FERM) In addition, our experiments demonstrate the superiority of fairness-accuracy tradeoffs offered by f-FERM for almost all batch sizes. Our extension is based on a distributionally robust optimization reformulation of f-FERM objective under $L_p$ norms as uncertainty sets.
arXiv Detail & Related papers (2023-12-06T03:14:16Z)
Personalized Federated Learning under Mixture of Distributions [98.25444470990107]
We propose a novel approach to Personalized Federated Learning (PFL), which utilizes Gaussian mixture models (GMM) to fit the input data distributions across diverse clients. FedGMM possesses an additional advantage of adapting to new clients with minimal overhead, and it also enables uncertainty quantification. Empirical evaluations on synthetic and benchmark datasets demonstrate the superior performance of our method in both PFL classification and novel sample detection.
arXiv Detail & Related papers (2023-05-01T20:04:46Z)
Mutual Wasserstein Discrepancy Minimization for Sequential Recommendation [82.0801585843835]
We propose a novel self-supervised learning framework based on Mutual WasserStein discrepancy minimization MStein for the sequential recommendation. We also propose a novel contrastive learning loss based on Wasserstein Discrepancy Measurement.
arXiv Detail & Related papers (2023-01-28T13:38:48Z)
Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models. In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints. A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z)
On Tilted Losses in Machine Learning: Theory and Applications [26.87656095874882]
Exponential tilting is a technique commonly used in fields such as statistics, probability, information theory, and optimization. We study a simple extension to ERM, which uses exponential tilting to flexibly tune the impact of individual losses. We find that the framework can consistently outperform ERM and deliver competitive performance with state-of-the-art, problem-specific approaches.
arXiv Detail & Related papers (2021-09-13T17:33:42Z)
Learning with Multiclass AUC: Theory and Algorithms [141.63211412386283]
Area under the ROC curve (AUC) is a well-known ranking metric for problems such as imbalanced learning and recommender systems. In this paper, we start an early trial to consider the problem of learning multiclass scoring functions via optimizing multiclass AUC metrics.
arXiv Detail & Related papers (2021-07-28T05:18:10Z)
CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information [105.73798100327667]
We propose a novel Contrastive Log-ratio Upper Bound (CLUB) of mutual information. We provide a theoretical analysis of the properties of CLUB and its variational approximation. Based on this upper bound, we introduce a MI minimization training scheme and further accelerate it with a negative sampling strategy.
arXiv Detail & Related papers (2020-06-22T05:36:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.