Related papers: Exploring Information-Theoretic Metrics Associated with Neural Collapse in Supervised Training

Exploring Information-Theoretic Metrics Associated with Neural Collapse in Supervised Training

URL: http://arxiv.org/abs/2409.16767v2
Date: Fri, 28 Feb 2025 14:19:00 GMT
Title: Exploring Information-Theoretic Metrics Associated with Neural Collapse in Supervised Training
Authors: Kun Song, Zhiquan Tan, Bochao Zou, Jiansheng Chen, Huimin Ma, Weiran Huang,
Abstract summary: We introduce matrix entropy as an analytical tool for studying supervised learning.<n>We show that matrix entropy effectively captures the variations in information content of data representations as neural networks approach Neural Collapse.<n>We also propose Cross-Model Alignment (CMA) loss to optimize the fine-tuning of pretrained models.
Score: 14.9343236333741
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we introduce matrix entropy as an analytical tool for studying supervised learning, investigating the information content of data representations and classification head vectors, as well as the dynamic interactions between them during the supervised learning process. Our experimental results reveal that matrix entropy effectively captures the variations in information content of data representations and classification head vectors as neural networks approach Neural Collapse during supervised training, while also serving as a robust metric for measuring similarity among data samples. Leveraging this property, we propose Cross-Model Alignment (CMA) loss to optimize the fine-tuning of pretrained models. To characterize the dynamics of neural networks nearing the Neural Collapse state, we introduce two novel metrics: the Matrix Mutual Information Ratio (MIR) and the Matrix Entropy Difference Ratio (HDR), which quantitatively assess the interactions between data representations and classification heads in supervised learning, with theoretical optimal values derived under the Neural Collapse state. Our experiments demonstrate that MIR and HDR effectively explain various phenomena in neural networks, including the dynamics of standard supervised training, linear mode connectivity. Moreover, we use MIR and HDR to analyze the dynamics of grokking, which is a fascinating phenomenon in supervised learning where a model unexpectedly exhibits generalization long after achieving training data fit.

Related papers

Towards Understanding the Benefits of Neural Network Parameterizations in Geophysical Inversions: A Study With Neural Fields [1.7396556690675236]
In this work, we employ neural fields, which use neural networks to map a coordinate to the corresponding physical property value at that coordinate, in a test-time learning manner. For a test-time learning method, the weights are learned during the inversion, as compared to traditional approaches which require a network to be trained using a training data set.
arXiv Detail & Related papers (2025-03-21T19:32:52Z)
Predicting Steady-State Behavior in Complex Networks with Graph Neural Networks [0.0]
In complex systems, information propagation can be defined as diffused or delocalized, weakly localized, and strongly localized. This study investigates the application of graph neural network models to learn the behavior of a linear dynamical system on networks.
arXiv Detail & Related papers (2025-02-02T17:29:10Z)
Inferring stochastic low-rank recurrent neural networks from neural data [5.179844449042386]
A central aim in computational neuroscience is to relate the activity of large neurons to an underlying dynamical system. Low-rank recurrent neural networks (RNNs) exhibit such interpretability by having tractable dynamics. Here, we propose to fit low-rank RNNs with variational sequential Monte Carlo methods.
arXiv Detail & Related papers (2024-06-24T15:57:49Z)
Unveiling the Dynamics of Information Interplay in Supervised Learning [10.122733373023074]
We use matrix information theory as an analytical tool to analyze the dynamics of the information interplay between data representations and classification head vectors in the supervised learning process. Our experiments show that MIR and HDR can effectively explain many phenomena occurring in neural networks. We introduce MIR and HDR loss terms in supervised and semi-supervised learning to optimize the information interactions among samples and classification heads.
arXiv Detail & Related papers (2024-06-06T12:17:57Z)
An Information Theoretic Evaluation Metric For Strong Unlearning [20.143627174765985]
We introduce the Information Difference Index (IDI), a novel white-box metric inspired by information theory. IDI quantifies retained information in intermediate features by measuring mutual information between those features and the labels to be forgotten. Our experiments demonstrate that IDI effectively measures the degree of unlearning across various datasets and architectures.
arXiv Detail & Related papers (2024-05-28T06:57:01Z)
Demolition and Reinforcement of Memories in Spin-Glass-like Neural Networks [0.0]
The aim of this thesis is to understand the effectiveness of Unlearning in both associative memory models and generative models. The selection of structured data enables an associative memory model to retrieve concepts as attractors of a neural dynamics with considerable basins of attraction. A novel regularization technique for Boltzmann Machines is presented, proving to outperform previously developed methods in learning hidden probability distributions from data-sets.
arXiv Detail & Related papers (2024-03-04T23:12:42Z)
Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective [68.20531518525273]
We take a closer look into existing self-supervised methods of speech from an information-theoretic perspective. We use linear probes to estimate the mutual information between the target information and learned representations. We explore the potential of evaluating representations in a self-supervised fashion, where we estimate the mutual information between different parts of the data without using any labels.
arXiv Detail & Related papers (2024-01-16T21:13:22Z)
Assessing Neural Network Representations During Training Using Noise-Resilient Diffusion Spectral Entropy [55.014926694758195]
Entropy and mutual information in neural networks provide rich information on the learning process. We leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures. We show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data.
arXiv Detail & Related papers (2023-12-04T01:32:42Z)
ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP) ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective. We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z)
EINNs: Epidemiologically-Informed Neural Networks [75.34199997857341]
We introduce a new class of physics-informed neural networks-EINN-crafted for epidemic forecasting. We investigate how to leverage both the theoretical flexibility provided by mechanistic models as well as the data-driven expressability afforded by AI models.
arXiv Detail & Related papers (2022-02-21T18:59:03Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
Information-Bottleneck-Based Behavior Representation Learning for Multi-agent Reinforcement learning [16.024781473545055]
In deep reinforcement learning, extracting sufficient and compact information of other agents is critical to attain efficient convergence and scalability of an algorithm. We present Information-Bottleneck-based Other agents' behavior Representation learning for Multi-agent reinforcement learning (IBORM) to explicitly seek low-dimensional mapping encoder.
arXiv Detail & Related papers (2021-09-29T04:22:49Z)
Deep Relational Metric Learning [84.95793654872399]
This paper presents a deep relational metric learning framework for image clustering and retrieval. We learn an ensemble of features that characterizes an image from different aspects to model both interclass and intraclass distributions. Experiments on the widely-used CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate that our framework improves existing deep metric learning methods and achieves very competitive results.
arXiv Detail & Related papers (2021-08-23T09:31:18Z)
Robust Representation Learning via Perceptual Similarity Metrics [18.842322467828502]
Contrastive Input Morphing (CIM) is a representation learning framework that learns input-space transformations of the data. We show that CIM is complementary to other mutual information-based representation learning techniques.
arXiv Detail & Related papers (2021-06-11T21:45:44Z)
Integrating Auxiliary Information in Self-supervised Learning [94.11964997622435]
We first observe that the auxiliary information may bring us useful information about data structures. We present to construct data clusters according to the auxiliary information. We show that Cl-InfoNCE may be a better approach to leverage the data clustering information.
arXiv Detail & Related papers (2021-06-05T11:01:15Z)
PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context. We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z)
Self-supervised Co-training for Video Representation Learning [103.69904379356413]
We investigate the benefit of adding semantic-class positives to instance-based Info Noise Contrastive Estimation training. We propose a novel self-supervised co-training scheme to improve the popular infoNCE loss. We evaluate the quality of the learnt representation on two different downstream tasks: action recognition and video retrieval.
arXiv Detail & Related papers (2020-10-19T17:59:01Z)
Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs) We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent. For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)
An Ode to an ODE [78.97367880223254]
We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the group O(d) This nested system of two flows provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem.
arXiv Detail & Related papers (2020-06-19T22:05:19Z)
Graph Representation Learning via Graphical Mutual Information Maximization [86.32278001019854]
We propose a novel concept, Graphical Mutual Information (GMI), to measure the correlation between input graphs and high-level hidden representations. We develop an unsupervised learning model trained by maximizing GMI between the input and output of a graph neural encoder.
arXiv Detail & Related papers (2020-02-04T08:33:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.