Related papers: Measuring Information Transfer in Neural Networks

Measuring Information Transfer in Neural Networks

URL: http://arxiv.org/abs/2009.07624v2
Date: Thu, 3 Dec 2020 07:16:10 GMT
Title: Measuring Information Transfer in Neural Networks
Authors: Xiao Zhang, Xingjian Li, Dejing Dou, Ji Wu
Abstract summary: Quantifying the information content in a neural network model is essentially estimating the model's Kolmogorov complexity. We propose a measure of the generalizable information in a neural network model based on prequential coding. We show that $L_IT$ is consistently correlated with generalizable information and can be used as a measure of patterns or "knowledge" in a model or a dataset.
Score: 46.37969746096677
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Quantifying the information content in a neural network model is essentially estimating the model's Kolmogorov complexity. Recent success of prequential coding on neural networks points to a promising path of deriving an efficient description length of a model. We propose a practical measure of the generalizable information in a neural network model based on prequential coding, which we term Information Transfer ($L_{IT}$). Theoretically, $L_{IT}$ is an estimation of the generalizable part of a model's information content. In experiments, we show that $L_{IT}$ is consistently correlated with generalizable information and can be used as a measure of patterns or "knowledge" in a model or a dataset. Consequently, $L_{IT}$ can serve as a useful analysis tool in deep learning. In this paper, we apply $L_{IT}$ to compare and dissect information in datasets, evaluate representation models in transfer learning, and analyze catastrophic forgetting and continual learning algorithms. $L_{IT}$ provides an information perspective which helps us discover new insights into neural network learning.

Related papers

Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization [66.03821840425539]
In this paper, we investigate the training dynamics of $L$-layer neural networks using the tensor gradient program (SGD) framework. We show that SGD enables these networks to learn linearly independent features that substantially deviate from their initial values. This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum.
arXiv Detail & Related papers (2025-03-12T17:33:13Z)
Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning. Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z)
Iterative self-transfer learning: A general methodology for response time-history prediction based on small dataset [0.0]
An iterative self-transfer learningmethod for training neural networks based on small datasets is proposed in this study. The results show that the proposed method can improve the model performance by near an order of magnitude on small datasets.
arXiv Detail & Related papers (2023-06-14T18:48:04Z)
Provable Identifiability of Two-Layer ReLU Neural Networks via LASSO Regularization [15.517787031620864]
The territory of LASSO is extended to two-layer ReLU neural networks, a fashionable and powerful nonlinear regression model. We show that the LASSO estimator can stably reconstruct the neural network and identify $mathcalSstar$ when the number of samples scales logarithmically. Our theory lies in an extended Restricted Isometry Property (RIP)-based analysis framework for two-layer ReLU neural networks.
arXiv Detail & Related papers (2023-05-07T13:05:09Z)
An Information-Theoretic Analysis of Compute-Optimal Neural Scaling Laws [24.356906682593532]
We study the compute-optimal trade-off between model and training data set sizes for large neural networks. Our result suggests a linear relation similar to that supported by the empirical analysis of chinchilla.
arXiv Detail & Related papers (2022-12-02T18:46:41Z)
Interpretability of an Interaction Network for identifying $H \rightarrow b\bar{b}$ jets [4.553120911976256]
In recent times, AI models based on deep neural networks are becoming increasingly popular for many of these applications. We explore interpretability of AI models by examining an Interaction Network (IN) model designed to identify boosted $Hto bbarb$ jets. We additionally illustrate the activity of hidden layers within the IN model as Neural Activation Pattern (NAP) diagrams.
arXiv Detail & Related papers (2022-11-23T08:38:52Z)
Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters. We find that our approach successfully generates parameters for a wide range of loss prompts. We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z)
Mutual information estimation for graph convolutional neural networks [0.0]
We present an architecture-agnostic method for tracking a network's internal representations during training, which are then used to create a mutual information plane. We compare how the inductive bias introduced in graph-based architectures changes the mutual information plane relative to a fully connected neural network.
arXiv Detail & Related papers (2022-03-31T08:30:04Z)
Neural Capacitance: A New Perspective of Neural Network Selection via Edge Dynamics [85.31710759801705]
Current practice requires expensive computational costs in model training for performance prediction. We propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training. Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections.
arXiv Detail & Related papers (2022-01-11T20:53:15Z)
Reasoning-Modulated Representations [85.08205744191078]
We study a common setting where our task is not purely opaque. Our approach paves the way for a new class of data-efficient representation learning.
arXiv Detail & Related papers (2021-07-19T13:57:13Z)
A Theory of Usable Information Under Computational Constraints [103.5901638681034]
We propose a new framework for reasoning about information in complex systems. Our foundation is based on a variational extension of Shannon's information theory. We show that by incorporating computational constraints, $mathcalV$-information can be reliably estimated from data.
arXiv Detail & Related papers (2020-02-25T06:09:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.