MIST: Mutual Information Via Supervised Training
- URL: http://arxiv.org/abs/2511.18945v1
- Date: Mon, 24 Nov 2025 09:55:28 GMT
- Title: MIST: Mutual Information Via Supervised Training
- Authors: German Gritsai, Megan Richards, Maxime Méloux, Kyunghyun Cho, Maxime Peyrard,
- Abstract summary: We propose a fully data-driven approach to designing mutual information (MI) estimators.<n>Since any MI estimator is a function of the observed sample from two random variables, we parameterize this function with a neural network (MIST)<n>Training is performed on a large meta-dataset of 625,000 synthetic joint distributions with known ground-truth MI.
- Score: 41.02529625643583
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We propose a fully data-driven approach to designing mutual information (MI) estimators. Since any MI estimator is a function of the observed sample from two random variables, we parameterize this function with a neural network (MIST) and train it end-to-end to predict MI values. Training is performed on a large meta-dataset of 625,000 synthetic joint distributions with known ground-truth MI. To handle variable sample sizes and dimensions, we employ a two-dimensional attention scheme ensuring permutation invariance across input samples. To quantify uncertainty, we optimize a quantile regression loss, enabling the estimator to approximate the sampling distribution of MI rather than return a single point estimate. This research program departs from prior work by taking a fully empirical route, trading universal theoretical guarantees for flexibility and efficiency. Empirically, the learned estimators largely outperform classical baselines across sample sizes and dimensions, including on joint distributions unseen during training. The resulting quantile-based intervals are well-calibrated and more reliable than bootstrap-based confidence intervals, while inference is orders of magnitude faster than existing neural baselines. Beyond immediate empirical gains, this framework yields trainable, fully differentiable estimators that can be embedded into larger learning pipelines. Moreover, exploiting MI's invariance to invertible transformations, meta-datasets can be adapted to arbitrary data modalities via normalizing flows, enabling flexible training for diverse target meta-distributions.
Related papers
- Minimum Distance Summaries for Robust Neural Posterior Estimation [7.4716500353679685]
Simulation-based inference ( SBI) enables amortized Bayesian inference by first training a neural posterior estimator (NPE) on prior-simulator pairs.<n>We introduce minimum-distance summaries, a plug-in robust NPE method that adapts queried test-time summaries independently of the pretrained NPE.
arXiv Detail & Related papers (2026-02-09T20:06:15Z) - Bayesian Meta-Reinforcement Learning with Laplace Variational Recurrent Networks [8.73717644648873]
We show how one can augment a point estimate to give full distributions without modifying the base model architecture.<n>Our method performs on par with variational baselines while having much fewer parameters.
arXiv Detail & Related papers (2025-05-24T08:38:10Z) - A Deep Bayesian Nonparametric Framework for Robust Mutual Information Estimation [9.68824512279232]
Mutual Information (MI) is a crucial measure for capturing dependencies between variables.<n>We present a solution for training an MI estimator by constructing the MI loss with a finite representation of the Dirichlet process posterior to incorporate regularization.<n>We explore the application of our estimator in maximizing MI between the data space and the latent space of a variational autoencoder.
arXiv Detail & Related papers (2025-03-11T21:27:48Z) - Bayesian Estimation and Tuning-Free Rank Detection for Probability Mass Function Tensors [17.640500920466984]
This paper presents a novel framework for estimating the joint PMF and automatically inferring its rank from observed data.
We derive a deterministic solution based on variational inference (VI) to approximate the posterior distributions of various model parameters. Additionally, we develop a scalable version of the VI-based approach by leveraging variational inference (SVI)
Experiments involving both synthetic data and real movie recommendation data illustrate the advantages of our VI and SVI-based methods in terms of estimation accuracy, automatic rank detection, and computational efficiency.
arXiv Detail & Related papers (2024-10-08T20:07:49Z) - Amortised Inference in Neural Networks for Small-Scale Probabilistic
Meta-Learning [41.85464593920907]
A global inducing point variational approximation for BNNs is based on using a set of inducing inputs to construct a series of conditional distributions.
Our key insight is that these inducing inputs can be replaced by the actual data, such that the variational distribution consists of a set of approximate likelihoods for each datapoint.
By training this inference network across related datasets, we can meta-learn Bayesian inference over task-specific BNNs.
arXiv Detail & Related papers (2023-10-24T12:34:25Z) - Tackling Computational Heterogeneity in FL: A Few Theoretical Insights [68.8204255655161]
We introduce and analyse a novel aggregation framework that allows for formalizing and tackling computational heterogeneous data.
Proposed aggregation algorithms are extensively analyzed from a theoretical, and an experimental prospective.
arXiv Detail & Related papers (2023-07-12T16:28:21Z) - Federated Learning as Variational Inference: A Scalable Expectation
Propagation Approach [66.9033666087719]
This paper extends the inference view and describes a variational inference formulation of federated learning.
We apply FedEP on standard federated learning benchmarks and find that it outperforms strong baselines in terms of both convergence speed and accuracy.
arXiv Detail & Related papers (2023-02-08T17:58:11Z) - DRFLM: Distributionally Robust Federated Learning with Inter-client
Noise via Local Mixup [58.894901088797376]
federated learning has emerged as a promising approach for training a global model using data from multiple organizations without leaking their raw data.
We propose a general framework to solve the above two challenges simultaneously.
We provide comprehensive theoretical analysis including robustness analysis, convergence analysis, and generalization ability.
arXiv Detail & Related papers (2022-04-16T08:08:29Z) - Assessments of model-form uncertainty using Gaussian stochastic weight
averaging for fluid-flow regression [0.0]
We use Gaussian weight averaging (SWAG) to assess the model-form uncertainty associated with neural-network-based function approximation relevant to fluid flows.
SWAG approximates a posterior Gaussian distribution of each weight, given training data, and a constant learning rate.
We demonstrate the applicability of the method for two types of neural networks.
arXiv Detail & Related papers (2021-09-16T23:13:26Z) - Neural Methods for Point-wise Dependency Estimation [129.93860669802046]
We focus on estimating point-wise dependency (PD), which quantitatively measures how likely two outcomes co-occur.
We demonstrate the effectiveness of our approaches in 1) MI estimation, 2) self-supervised representation learning, and 3) cross-modal retrieval task.
arXiv Detail & Related papers (2020-06-09T23:26:15Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.