Tight Mutual Information Estimation With Contrastive Fenchel-Legendre
Optimization
- URL: http://arxiv.org/abs/2107.01131v1
- Date: Fri, 2 Jul 2021 15:20:41 GMT
- Title: Tight Mutual Information Estimation With Contrastive Fenchel-Legendre
Optimization
- Authors: Qing Guo, Junya Chen, Dong Wang, Yuewei Yang, Xinwei Deng, Lawrence
Carin, Fan Li, Chenyang Tao
- Abstract summary: We introduce a novel, simple, and powerful contrastive MI estimator named as FLO.
Empirically, our FLO estimator overcomes the limitations of its predecessors and learns more efficiently.
The utility of FLO is verified using an extensive set of benchmarks, which also reveals the trade-offs in practical MI estimation.
- Score: 69.07420650261649
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Successful applications of InfoNCE and its variants have popularized the use
of contrastive variational mutual information (MI) estimators in machine
learning. While featuring superior stability, these estimators crucially depend
on costly large-batch training, and they sacrifice bound tightness for variance
reduction. To overcome these limitations, we revisit the mathematics of popular
variational MI bounds from the lens of unnormalized statistical modeling and
convex optimization. Our investigation not only yields a new unified
theoretical framework encompassing popular variational MI bounds but also leads
to a novel, simple, and powerful contrastive MI estimator named as FLO.
Theoretically, we show that the FLO estimator is tight, and it provably
converges under stochastic gradient descent. Empirically, our FLO estimator
overcomes the limitations of its predecessors and learns more efficiently. The
utility of FLO is verified using an extensive set of benchmarks, which also
reveals the trade-offs in practical MI estimation.
Related papers
- RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models [95.32315448601241]
We propose an algorithm named Rotated Straight-Through-Estimator (RoSTE)
RoSTE combines quantization-aware supervised fine-tuning (QA-SFT) with an adaptive rotation strategy to reduce activation outliers.
Our findings reveal that the prediction error is directly proportional to the quantization error of the converged weights, which can be effectively managed through an optimized rotation configuration.
arXiv Detail & Related papers (2025-02-13T06:44:33Z) - On the Convergence of Zeroth-Order Federated Tuning for Large Language Models [36.277423093218275]
Federated Learning and Large Language Models (LLMs) are ushering in a new era in privacy-preserving natural language processing.
Memory-efficient Zeroth-Order Optimization is a synergy we term as FedMeZO.
Our study is the first to examine the theoretical underpinnings of FedMeZO in the context of LLMs.
arXiv Detail & Related papers (2024-02-08T18:56:40Z) - f-FERM: A Scalable Framework for Robust Fair Empirical Risk Minimization [9.591164070876689]
This paper presents a unified optimization framework for fair empirical risk based on f-divergence measures (f-FERM)
In addition, our experiments demonstrate the superiority of fairness-accuracy tradeoffs offered by f-FERM for almost all batch sizes.
Our extension is based on a distributionally robust optimization reformulation of f-FERM objective under $L_p$ norms as uncertainty sets.
arXiv Detail & Related papers (2023-12-06T03:14:16Z) - Personalized Federated Learning under Mixture of Distributions [98.25444470990107]
We propose a novel approach to Personalized Federated Learning (PFL), which utilizes Gaussian mixture models (GMM) to fit the input data distributions across diverse clients.
FedGMM possesses an additional advantage of adapting to new clients with minimal overhead, and it also enables uncertainty quantification.
Empirical evaluations on synthetic and benchmark datasets demonstrate the superior performance of our method in both PFL classification and novel sample detection.
arXiv Detail & Related papers (2023-05-01T20:04:46Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Keep it Tighter -- A Story on Analytical Mean Embeddings [0.6445605125467574]
Kernel techniques are among the most popular and flexible approaches in data science.
Mean embedding gives rise to a divergence measure referred to as maximum mean discrepancy (MMD)
In this paper we focus on the problem of MMD estimation when the mean embedding of one of the underlying distributions is available analytically.
arXiv Detail & Related papers (2021-10-15T21:29:27Z) - On Tilted Losses in Machine Learning: Theory and Applications [26.87656095874882]
Exponential tilting is a technique commonly used in fields such as statistics, probability, information theory, and optimization.
We study a simple extension to ERM, which uses exponential tilting to flexibly tune the impact of individual losses.
We find that the framework can consistently outperform ERM and deliver competitive performance with state-of-the-art, problem-specific approaches.
arXiv Detail & Related papers (2021-09-13T17:33:42Z) - Learning with Multiclass AUC: Theory and Algorithms [141.63211412386283]
Area under the ROC curve (AUC) is a well-known ranking metric for problems such as imbalanced learning and recommender systems.
In this paper, we start an early trial to consider the problem of learning multiclass scoring functions via optimizing multiclass AUC metrics.
arXiv Detail & Related papers (2021-07-28T05:18:10Z) - CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information [105.73798100327667]
We propose a novel Contrastive Log-ratio Upper Bound (CLUB) of mutual information.
We provide a theoretical analysis of the properties of CLUB and its variational approximation.
Based on this upper bound, we introduce a MI minimization training scheme and further accelerate it with a negative sampling strategy.
arXiv Detail & Related papers (2020-06-22T05:36:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.