Assessing Neural Network Representations During Training Using
Noise-Resilient Diffusion Spectral Entropy
- URL: http://arxiv.org/abs/2312.04823v1
- Date: Mon, 4 Dec 2023 01:32:42 GMT
- Title: Assessing Neural Network Representations During Training Using
Noise-Resilient Diffusion Spectral Entropy
- Authors: Danqi Liao, Chen Liu, Benjamin W. Christensen, Alexander Tong,
Guillaume Huguet, Guy Wolf, Maximilian Nickel, Ian Adelstein, Smita
Krishnaswamy
- Abstract summary: Entropy and mutual information in neural networks provide rich information on the learning process.
We leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures.
We show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data.
- Score: 55.014926694758195
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Entropy and mutual information in neural networks provide rich information on
the learning process, but they have proven difficult to compute reliably in
high dimensions. Indeed, in noisy and high-dimensional data, traditional
estimates in ambient dimensions approach a fixed entropy and are prohibitively
hard to compute. To address these issues, we leverage data geometry to access
the underlying manifold and reliably compute these information-theoretic
measures. Specifically, we define diffusion spectral entropy (DSE) in neural
representations of a dataset as well as diffusion spectral mutual information
(DSMI) between different variables representing data. First, we show that they
form noise-resistant measures of intrinsic dimensionality and relationship
strength in high-dimensional simulated data that outperform classic Shannon
entropy, nonparametric estimation, and mutual information neural estimation
(MINE). We then study the evolution of representations in classification
networks with supervised learning, self-supervision, or overfitting. We observe
that (1) DSE of neural representations increases during training; (2) DSMI with
the class label increases during generalizable learning but stays stagnant
during overfitting; (3) DSMI with the input signal shows differing trends: on
MNIST it increases, while on CIFAR-10 and STL-10 it decreases. Finally, we show
that DSE can be used to guide better network initialization and that DSMI can
be used to predict downstream classification accuracy across 962 models on
ImageNet. The official implementation is available at
https://github.com/ChenLiu-1996/DiffusionSpectralEntropy.
Related papers
- Average gradient outer product as a mechanism for deep neural collapse [26.939895223897572]
Deep Neural Collapse (DNC) refers to the surprisingly rigid structure of the data representations in the final layers of Deep Neural Networks (DNNs)
In this work, we introduce a data-dependent setting where DNC forms due to feature learning through the average gradient outer product (AGOP)
We show that the right singular vectors and values of the weights can be responsible for the majority of within-class variability collapse for neural networks trained in the feature learning regime.
arXiv Detail & Related papers (2024-02-21T11:40:27Z) - A Generative Self-Supervised Framework using Functional Connectivity in
fMRI Data [15.211387244155725]
Deep neural networks trained on Functional Connectivity (FC) networks extracted from functional Magnetic Resonance Imaging (fMRI) data have gained popularity.
Recent research on the application of Graph Neural Network (GNN) to FC suggests that exploiting the time-varying properties of the FC could significantly improve the accuracy and interpretability of the model prediction.
High cost of acquiring high-quality fMRI data and corresponding labels poses a hurdle to their application in real-world settings.
We propose a generative SSL approach that is tailored to effectively harnesstemporal information within dynamic FC.
arXiv Detail & Related papers (2023-12-04T16:14:43Z) - Deep Neural Networks Tend To Extrapolate Predictably [51.303814412294514]
neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs.
We observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD.
We show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.
arXiv Detail & Related papers (2023-10-02T03:25:32Z) - Neural Koopman prior for data assimilation [7.875955593012905]
We use a neural network architecture to embed dynamical systems in latent spaces.
We introduce methods that enable to train such a model for long-term continuous reconstruction.
The potential for self-supervised learning is also demonstrated, as we show the promising use of trained dynamical models as priors for variational data assimilation techniques.
arXiv Detail & Related papers (2023-09-11T09:04:36Z) - Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data.
Main aim of the identified model is to predict new data from previous observations.
We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Tensor-Train Networks for Learning Predictive Modeling of
Multidimensional Data [0.0]
A promising strategy is based on tensor networks, which have been very successful in physical and chemical applications.
We show that the weights of a multidimensional regression model can be learned by means of tensor networks with the aim of performing a powerful compact representation.
An algorithm based on alternating least squares has been proposed for approximating the weights in TT-format with a reduction of computational power.
arXiv Detail & Related papers (2021-01-22T16:14:38Z) - Multi-fidelity Bayesian Neural Networks: Algorithms and Applications [0.0]
We propose a new class of Bayesian neural networks (BNNs) that can be trained using noisy data of variable fidelity.
We apply them to learn function approximations as well as to solve inverse problems based on partial differential equations (PDEs)
arXiv Detail & Related papers (2020-12-19T02:03:53Z) - Understanding Self-supervised Learning with Dual Deep Networks [74.92916579635336]
We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks.
We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator.
To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
arXiv Detail & Related papers (2020-10-01T17:51:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.