Related papers: Assessing Neural Network Representations During Training Using Noise-Resilient Diffusion Spectral Entropy

Assessing Neural Network Representations During Training Using Noise-Resilient Diffusion Spectral Entropy

URL: http://arxiv.org/abs/2312.04823v1
Date: Mon, 4 Dec 2023 01:32:42 GMT
Title: Assessing Neural Network Representations During Training Using Noise-Resilient Diffusion Spectral Entropy
Authors: Danqi Liao, Chen Liu, Benjamin W. Christensen, Alexander Tong, Guillaume Huguet, Guy Wolf, Maximilian Nickel, Ian Adelstein, Smita Krishnaswamy
Abstract summary: Entropy and mutual information in neural networks provide rich information on the learning process. We leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures. We show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data.
Score: 55.014926694758195
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Entropy and mutual information in neural networks provide rich information on the learning process, but they have proven difficult to compute reliably in high dimensions. Indeed, in noisy and high-dimensional data, traditional estimates in ambient dimensions approach a fixed entropy and are prohibitively hard to compute. To address these issues, we leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures. Specifically, we define diffusion spectral entropy (DSE) in neural representations of a dataset as well as diffusion spectral mutual information (DSMI) between different variables representing data. First, we show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data that outperform classic Shannon entropy, nonparametric estimation, and mutual information neural estimation (MINE). We then study the evolution of representations in classification networks with supervised learning, self-supervision, or overfitting. We observe that (1) DSE of neural representations increases during training; (2) DSMI with the class label increases during generalizable learning but stays stagnant during overfitting; (3) DSMI with the input signal shows differing trends: on MNIST it increases, while on CIFAR-10 and STL-10 it decreases. Finally, we show that DSE can be used to guide better network initialization and that DSMI can be used to predict downstream classification accuracy across 962 models on ImageNet. The official implementation is available at https://github.com/ChenLiu-1996/DiffusionSpectralEntropy.

Related papers

Exploring Information-Theoretic Metrics Associated with Neural Collapse in Supervised Training [14.9343236333741]
We introduce matrix entropy as an analytical tool for studying supervised learning. We show that matrix entropy effectively captures the variations in information content of data representations as neural networks approach Neural Collapse. We also propose Cross-Model Alignment (CMA) loss to optimize the fine-tuning of pretrained models.
arXiv Detail & Related papers (2024-09-25T09:26:06Z)
Average gradient outer product as a mechanism for deep neural collapse [26.939895223897572]
Deep Neural Collapse (DNC) refers to the surprisingly rigid structure of the data representations in the final layers of Deep Neural Networks (DNNs) In this work, we introduce a data-dependent setting where DNC forms due to feature learning through the average gradient outer product (AGOP) We show that the right singular vectors and values of the weights can be responsible for the majority of within-class variability collapse for neural networks trained in the feature learning regime.
arXiv Detail & Related papers (2024-02-21T11:40:27Z)
A Generative Self-Supervised Framework using Functional Connectivity in fMRI Data [15.211387244155725]
Deep neural networks trained on Functional Connectivity (FC) networks extracted from functional Magnetic Resonance Imaging (fMRI) data have gained popularity. Recent research on the application of Graph Neural Network (GNN) to FC suggests that exploiting the time-varying properties of the FC could significantly improve the accuracy and interpretability of the model prediction. High cost of acquiring high-quality fMRI data and corresponding labels poses a hurdle to their application in real-world settings. We propose a generative SSL approach that is tailored to effectively harnesstemporal information within dynamic FC.
arXiv Detail & Related papers (2023-12-04T16:14:43Z)
Deep Neural Networks Tend To Extrapolate Predictably [51.303814412294514]
neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs. We observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD. We show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.
arXiv Detail & Related papers (2023-10-02T03:25:32Z)
Neural Koopman prior for data assimilation [7.875955593012905]
We use a neural network architecture to embed dynamical systems in latent spaces. We introduce methods that enable to train such a model for long-term continuous reconstruction. The potential for self-supervised learning is also demonstrated, as we show the promising use of trained dynamical models as priors for variational data assimilation techniques.
arXiv Detail & Related papers (2023-09-11T09:04:36Z)
Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data. Main aim of the identified model is to predict new data from previous observations. We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z)
Neural networks trained with SGD learn distributions of increasing complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics. We then exploit higher-order statistics only later during training. We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
Tensor-Train Networks for Learning Predictive Modeling of Multidimensional Data [0.0]
A promising strategy is based on tensor networks, which have been very successful in physical and chemical applications. We show that the weights of a multidimensional regression model can be learned by means of tensor networks with the aim of performing a powerful compact representation. An algorithm based on alternating least squares has been proposed for approximating the weights in TT-format with a reduction of computational power.
arXiv Detail & Related papers (2021-01-22T16:14:38Z)
Multi-fidelity Bayesian Neural Networks: Algorithms and Applications [0.0]
We propose a new class of Bayesian neural networks (BNNs) that can be trained using noisy data of variable fidelity. We apply them to learn function approximations as well as to solve inverse problems based on partial differential equations (PDEs)
arXiv Detail & Related papers (2020-12-19T02:03:53Z)
Understanding Self-supervised Learning with Dual Deep Networks [74.92916579635336]
We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks. We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator. To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
arXiv Detail & Related papers (2020-10-01T17:51:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.