Related papers: A robust estimator of mutual information for deep learning interpretability

A robust estimator of mutual information for deep learning interpretability

URL: http://arxiv.org/abs/2211.00024v2
Date: Thu, 23 Mar 2023 16:18:11 GMT
Title: A robust estimator of mutual information for deep learning interpretability
Authors: Davide Piras, Hiranya V. Peiris, Andrew Pontzen, Luisa Lucie-Smith, Ningyuan Guo, Brian Nord
Abstract summary: We present GMM-MI, an algorithm that can be applied to both discrete and continuous settings. We extensively validate GMM-MI on toy data for which the ground truth MI is known. We then demonstrate the use of our MI estimator in the context of representation learning.
Score: 2.574652392763709
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We develop the use of mutual information (MI), a well-established metric in information theory, to interpret the inner workings of deep learning models. To accurately estimate MI from a finite number of samples, we present GMM-MI (pronounced $``$Jimmie$"$), an algorithm based on Gaussian mixture models that can be applied to both discrete and continuous settings. GMM-MI is computationally efficient, robust to the choice of hyperparameters and provides the uncertainty on the MI estimate due to the finite sample size. We extensively validate GMM-MI on toy data for which the ground truth MI is known, comparing its performance against established mutual information estimators. We then demonstrate the use of our MI estimator in the context of representation learning, working with synthetic data and physical datasets describing highly non-linear processes. We train deep learning models to encode high-dimensional data within a meaningful compressed (latent) representation, and use GMM-MI to quantify both the level of disentanglement between the latent variables, and their association with relevant physical quantities, thus unlocking the interpretability of the latent representation. We make GMM-MI publicly available.

Related papers

Exploring Training and Inference Scaling Laws in Generative Retrieval [50.82554729023865]
We investigate how model size, training data scale, and inference-time compute jointly influence generative retrieval performance. Our experiments show that n-gram-based methods demonstrate strong alignment with both training and inference scaling laws. We find that LLaMA models consistently outperform T5 models, suggesting a particular advantage for larger decoder-only models in generative retrieval.
arXiv Detail & Related papers (2025-03-24T17:59:03Z)
A Benchmark Suite for Evaluating Neural Mutual Information Estimators on Unstructured Datasets [3.2228025627337864]
Mutual Information (MI) is a fundamental metric for quantifying dependency between two random variables. This study introduces a comprehensive benchmark suite for evaluating neural MI estimators on unstructured datasets.
arXiv Detail & Related papers (2024-10-14T14:22:38Z)
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models [55.903148392998965]
We introduce LOKI, a novel benchmark designed to evaluate the ability of LMMs to detect synthetic data across multiple modalities. The benchmark includes coarse-grained judgment and multiple-choice questions, as well as fine-grained anomaly selection and explanation tasks. We evaluate 22 open-source LMMs and 6 closed-source models on LOKI, highlighting their potential as synthetic data detectors and also revealing some limitations in the development of LMM capabilities.
arXiv Detail & Related papers (2024-10-13T05:26:36Z)
Detecting Training Data of Large Language Models via Expectation Maximization [62.28028046993391]
Membership inference attacks (MIAs) aim to determine whether a specific instance was part of a target model's training data. Applying MIAs to large language models (LLMs) presents unique challenges due to the massive scale of pre-training data and the ambiguous nature of membership. We introduce EM-MIA, a novel MIA method for LLMs that iteratively refines membership scores and prefix scores via an expectation-maximization algorithm.
arXiv Detail & Related papers (2024-10-10T03:31:16Z)
Classifying Overlapping Gaussian Mixtures in High Dimensions: From Optimal Classifiers to Neural Nets [1.8434042562191815]
We derive expressions for the Bayes optimal decision boundaries in binary classification of high dimensional overlapping Gaussian mixture model (GMM) data. We empirically demonstrate, through experiments on synthetic GMMs inspired by real-world data, that deep neural networks trained for classification, learn predictors which approximate the derived optimal classifiers.
arXiv Detail & Related papers (2024-05-28T17:59:31Z)
Minimally Supervised Learning using Topological Projections in Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs) Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU) Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z)
Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference. Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z)
Incremental Multimodal Surface Mapping via Self-Organizing Gaussian Mixture Models [1.0878040851638]
This letter describes an incremental multimodal surface mapping methodology, which represents the environment as a continuous probabilistic model. The strategy employed in this work utilizes Gaussian mixture models (GMMs) to represent the environment. To bridge this gap, this letter introduces a spatial hash map for rapid GMM submap extraction combined with an approach to determine relevant and redundant data in a point cloud.
arXiv Detail & Related papers (2023-09-19T19:49:03Z)
Improving Mutual Information Estimation with Annealed and Energy-Based Bounds [20.940022170594816]
Mutual information (MI) is a fundamental quantity in information theory and machine learning. We present a unifying view of existing MI bounds from the perspective of importance sampling. We propose three novel bounds based on this approach.
arXiv Detail & Related papers (2023-03-13T10:47:24Z)
k-Sliced Mutual Information: A Quantitative Study of Scalability with Dimension [21.82863736290358]
We extend the original SMI definition to $k$-SMI, which considers projections to $k$-dimensional subspaces. Using a new result on the continuity of differential entropy in the 2-Wasserstein metric, we derive sharp bounds on the error of Monte Carlo (MC)-based estimates of $k$-SMI. We then combine the MC integrator with the neural estimation framework to provide an end-to-end $k$-SMI estimator.
arXiv Detail & Related papers (2022-06-17T03:19:55Z)
Continual Learning with Fully Probabilistic Models [70.3497683558609]
We present an approach for continual learning based on fully probabilistic (or generative) models of machine learning. We propose a pseudo-rehearsal approach using a Gaussian Mixture Model (GMM) instance for both generator and classifier functionalities. We show that GMR achieves state-of-the-art performance on common class-incremental learning problems at very competitive time and memory complexity.
arXiv Detail & Related papers (2021-04-19T12:26:26Z)
Mutual Information Gradient Estimation for Representation Learning [56.08429809658762]
Mutual Information (MI) plays an important role in representation learning. Recent advances establish tractable and scalable MI estimators to discover useful representation. We propose the Mutual Information Gradient Estimator (MIGE) for representation learning based on the score estimation of implicit distributions.
arXiv Detail & Related papers (2020-05-03T16:05:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.