Related papers: On Information Plane Analyses of Neural Network Classifiers -- A Review

On Information Plane Analyses of Neural Network Classifiers -- A Review

URL: http://arxiv.org/abs/2003.09671v3
Date: Thu, 10 Jun 2021 15:06:30 GMT
Title: On Information Plane Analyses of Neural Network Classifiers -- A Review
Authors: Bernhard C. Geiger
Abstract summary: We show that compression visualized in information planes is not necessarily information-theoretic. We argue that even in feed-forward neural networks the data processing inequality need not hold for estimates of mutual information.
Score: 7.804994311050265
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We review the current literature concerned with information plane analyses of neural network classifiers. While the underlying information bottleneck theory and the claim that information-theoretic compression is causally linked to generalization are plausible, empirical evidence was found to be both supporting and conflicting. We review this evidence together with a detailed analysis of how the respective information quantities were estimated. Our survey suggests that compression visualized in information planes is not necessarily information-theoretic, but is rather often compatible with geometric compression of the latent representations. This insight gives the information plane a renewed justification. Aside from this, we shed light on the problem of estimating mutual information in deterministic neural networks and its consequences. Specifically, we argue that even in feed-forward neural networks the data processing inequality need not hold for estimates of mutual information. Similarly, while a fitting phase, in which the mutual information between the latent representation and the target increases, is necessary (but not sufficient) for good classification performance, depending on the specifics of mutual information estimation such a fitting phase need not be visible in the information plane.

Related papers

Adjustment for Confounding using Pre-Trained Representations [2.916285040262091]
We investigate how latent features from pre-trained neural networks can be leveraged to adjust for sources of confounding.<n>We show that neural networks can achieve fast convergence rates by adapting to intrinsic notions of sparsity and dimension of the learning problem.
arXiv Detail & Related papers (2025-06-17T09:11:17Z)
Information Plane Analysis Visualization in Deep Learning via Transfer Entropy [0.0]
In a feedforward network, Transfer Entropy can be used to measure the influence that one layer has on another. In contrast to mutual information, TE can capture temporal relationships between variables.
arXiv Detail & Related papers (2024-04-01T17:34:18Z)
TexShape: Information Theoretic Sentence Embedding for Language Models [5.265661844206274]
This paper addresses challenges regarding encoding sentences to their optimized representations through the lens of information-theory. We use empirical estimates of mutual information, using the Donsker-Varadhan definition of Kullback-Leibler divergence. Our experiments demonstrate significant advancements in preserving maximal targeted information and minimal sensitive information over adverse compression ratios.
arXiv Detail & Related papers (2024-02-05T22:48:28Z)
Assessing Neural Network Representations During Training Using Noise-Resilient Diffusion Spectral Entropy [55.014926694758195]
Entropy and mutual information in neural networks provide rich information on the learning process. We leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures. We show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data.
arXiv Detail & Related papers (2023-12-04T01:32:42Z)
Do Deep Neural Networks Always Perform Better When Eating More Data? [82.6459747000664]
We design experiments from Identically Independent Distribution(IID) and Out of Distribution(OOD) Under IID condition, the amount of information determines the effectivity of each sample, the contribution of samples and difference between classes determine the amount of class information. Under OOD condition, the cross-domain degree of samples determine the contributions, and the bias-fitting caused by irrelevant elements is a significant factor of cross-domain.
arXiv Detail & Related papers (2022-05-30T15:40:33Z)
Mutual information estimation for graph convolutional neural networks [0.0]
We present an architecture-agnostic method for tracking a network's internal representations during training, which are then used to create a mutual information plane. We compare how the inductive bias introduced in graph-based architectures changes the mutual information plane relative to a fully connected neural network.
arXiv Detail & Related papers (2022-03-31T08:30:04Z)
Decomposing neural networks as mappings of correlation functions [57.52754806616669]
We study the mapping between probability distributions implemented by a deep feed-forward network. We identify essential statistics in the data, as well as different information representations that can be used by neural networks.
arXiv Detail & Related papers (2022-02-10T09:30:31Z)
A Bayesian Framework for Information-Theoretic Probing [51.98576673620385]
We argue that probing should be seen as approximating a mutual information. This led to the rather unintuitive conclusion that representations encode exactly the same information about a target task as the original sentences. This paper proposes a new framework to measure what we term Bayesian mutual information.
arXiv Detail & Related papers (2021-09-08T18:08:36Z)
Uniform Convergence, Adversarial Spheres and a Simple Remedy [40.44709296304123]
Previous work has cast doubt on the general framework of uniform convergence and its ability to explain generalization in neural networks. We provide an extensive theoretical investigation of the previously studied data setting through the lens of infinitely-wide models. We prove that the Neural Tangent Kernel (NTK) also suffers from the same phenomenon and we uncover its origin.
arXiv Detail & Related papers (2021-05-07T20:23:01Z)
A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation. Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z)
Deep Co-Attention Network for Multi-View Subspace Learning [73.3450258002607]
We propose a deep co-attention network for multi-view subspace learning. It aims to extract both the common information and the complementary information in an adversarial setting. In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation.
arXiv Detail & Related papers (2021-02-15T18:46:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.