A robust estimator of mutual information for deep learning
interpretability
- URL: http://arxiv.org/abs/2211.00024v2
- Date: Thu, 23 Mar 2023 16:18:11 GMT
- Title: A robust estimator of mutual information for deep learning
interpretability
- Authors: Davide Piras, Hiranya V. Peiris, Andrew Pontzen, Luisa Lucie-Smith,
Ningyuan Guo, Brian Nord
- Abstract summary: We present GMM-MI, an algorithm that can be applied to both discrete and continuous settings.
We extensively validate GMM-MI on toy data for which the ground truth MI is known.
We then demonstrate the use of our MI estimator in the context of representation learning.
- Score: 2.574652392763709
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop the use of mutual information (MI), a well-established metric in
information theory, to interpret the inner workings of deep learning models. To
accurately estimate MI from a finite number of samples, we present GMM-MI
(pronounced $``$Jimmie$"$), an algorithm based on Gaussian mixture models that
can be applied to both discrete and continuous settings. GMM-MI is
computationally efficient, robust to the choice of hyperparameters and provides
the uncertainty on the MI estimate due to the finite sample size. We
extensively validate GMM-MI on toy data for which the ground truth MI is known,
comparing its performance against established mutual information estimators. We
then demonstrate the use of our MI estimator in the context of representation
learning, working with synthetic data and physical datasets describing highly
non-linear processes. We train deep learning models to encode high-dimensional
data within a meaningful compressed (latent) representation, and use GMM-MI to
quantify both the level of disentanglement between the latent variables, and
their association with relevant physical quantities, thus unlocking the
interpretability of the latent representation. We make GMM-MI publicly
available.
Related papers
- Classifying Overlapping Gaussian Mixtures in High Dimensions: From Optimal Classifiers to Neural Nets [1.8434042562191815]
We derive expressions for the Bayes optimal decision boundaries in binary classification of high dimensional overlapping Gaussian mixture model (GMM) data.
We empirically demonstrate, through experiments on synthetic GMMs inspired by real-world data, that deep neural networks trained for classification, learn predictors which approximate the derived optimal classifiers.
arXiv Detail & Related papers (2024-05-28T17:59:31Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference.
Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z) - Max-Sliced Mutual Information [17.667315953598788]
Quantifying the dependence between high-dimensional random variables is central to statistical learning and inference.
Two classical methods are canonical correlation analysis (CCA), which identifies maximally correlated projected versions of the original variables, and Shannon's mutual information, which is a universal dependence measure.
This work proposes a middle ground in the form of a scalable information-theoretic generalization of CCA, termed max-sliced mutual information (mSMI)
arXiv Detail & Related papers (2023-09-28T06:49:25Z) - Incremental Multimodal Surface Mapping via Self-Organizing Gaussian
Mixture Models [1.0878040851638]
This letter describes an incremental multimodal surface mapping methodology, which represents the environment as a continuous probabilistic model.
The strategy employed in this work utilizes Gaussian mixture models (GMMs) to represent the environment.
To bridge this gap, this letter introduces a spatial hash map for rapid GMM submap extraction combined with an approach to determine relevant and redundant data in a point cloud.
arXiv Detail & Related papers (2023-09-19T19:49:03Z) - Improving Mutual Information Estimation with Annealed and Energy-Based
Bounds [20.940022170594816]
Mutual information (MI) is a fundamental quantity in information theory and machine learning.
We present a unifying view of existing MI bounds from the perspective of importance sampling.
We propose three novel bounds based on this approach.
arXiv Detail & Related papers (2023-03-13T10:47:24Z) - k-Sliced Mutual Information: A Quantitative Study of Scalability with
Dimension [21.82863736290358]
We extend the original SMI definition to $k$-SMI, which considers projections to $k$-dimensional subspaces.
Using a new result on the continuity of differential entropy in the 2-Wasserstein metric, we derive sharp bounds on the error of Monte Carlo (MC)-based estimates of $k$-SMI.
We then combine the MC integrator with the neural estimation framework to provide an end-to-end $k$-SMI estimator.
arXiv Detail & Related papers (2022-06-17T03:19:55Z) - Post-mortem on a deep learning contest: a Simpson's paradox and the
complementary roles of scale metrics versus shape metrics [61.49826776409194]
We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models.
We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data.
We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
arXiv Detail & Related papers (2021-06-01T19:19:49Z) - Continual Learning with Fully Probabilistic Models [70.3497683558609]
We present an approach for continual learning based on fully probabilistic (or generative) models of machine learning.
We propose a pseudo-rehearsal approach using a Gaussian Mixture Model (GMM) instance for both generator and classifier functionalities.
We show that GMR achieves state-of-the-art performance on common class-incremental learning problems at very competitive time and memory complexity.
arXiv Detail & Related papers (2021-04-19T12:26:26Z) - A Rigorous Link Between Self-Organizing Maps and Gaussian Mixture Models [78.6363825307044]
This work presents a mathematical treatment of the relation between Self-Organizing Maps (SOMs) and Gaussian Mixture Models (GMMs)
We show that energy-based SOM models can be interpreted as performing gradient descent.
This link allows to treat SOMs as generative probabilistic models, giving a formal justification for using SOMs to detect outliers, or for sampling.
arXiv Detail & Related papers (2020-09-24T14:09:04Z) - Mutual Information Gradient Estimation for Representation Learning [56.08429809658762]
Mutual Information (MI) plays an important role in representation learning.
Recent advances establish tractable and scalable MI estimators to discover useful representation.
We propose the Mutual Information Gradient Estimator (MIGE) for representation learning based on the score estimation of implicit distributions.
arXiv Detail & Related papers (2020-05-03T16:05:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.