Disentangling Rich Dynamics from Feature Learning: A Framework for Independent Measurements
- URL: http://arxiv.org/abs/2410.04264v2
- Date: Wed, 04 Jun 2025 21:53:51 GMT
- Title: Disentangling Rich Dynamics from Feature Learning: A Framework for Independent Measurements
- Authors: Yoonsoo Nam, Nayara Fonseca, Seok Hyeong Lee, Chris Mingard, Niclas Goring, Ouns El Harzli, Abdurrahman Hadi Erturk, Soufiane Hayou, Ard A. Louis,
- Abstract summary: We introduce (1) a measure that quantifies the rich regime independently of performance, and (2) interpretable feature metrics for visualization.<n>We reveal how batch normalization and training set size influence lazy/rich dynamics for VGG16 and ResNet18 on CIFAR-10/100.
- Score: 5.369150515904139
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In machine learning, it is widely believed that dynamic feature transformation (the rich regime) enhances predictive performance. However, this link does not always hold, and existing richness measures rely on correlated factors - such as performance or parameter norms - which can complicate the analysis of feature learning. We introduce (1) a measure that quantifies the rich regime independently of performance, and (2) interpretable feature metrics for visualization. Leveraging low-rank bias, our approach generalizes neural collapse metrics and captures lazy-to-rich transitions (e.g., grokking) without relying on performance as a proxy. We reveal how batch normalization and training set size influence lazy/rich dynamics for VGG16 and ResNet18 on CIFAR-10/100, opening avenues for better understanding feature learning.
Related papers
- The Importance of Being Lazy: Scaling Limits of Continual Learning [60.97756735877614]
We show that increasing model width is only beneficial when it reduces the amount of feature learning, yielding more laziness.<n>We study the intricate relationship between feature learning, task non-stationarity, and forgetting, finding that high feature learning is only beneficial with highly similar tasks.
arXiv Detail & Related papers (2025-06-20T10:12:38Z) - Implicit bias produces neural scaling laws in learning curves, from perceptrons to deep networks [11.365318749216739]
We study the entire training dynamics through the lens of spectral complexity norms.<n>We identify two novel dynamical scaling laws that govern how performance evolves during training.<n>Our findings are consistent across CNNs, ResNets, and Vision Transformers trained on MNIST, CIFAR-10 and CIFAR-100.
arXiv Detail & Related papers (2025-05-19T15:13:36Z) - Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization [66.03821840425539]
In this paper, we investigate the training dynamics of $L$-layer neural networks using the tensor gradient program (SGD) framework.
We show that SGD enables these networks to learn linearly independent features that substantially deviate from their initial values.
This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum.
arXiv Detail & Related papers (2025-03-12T17:33:13Z) - Focus On This, Not That! Steering LLMs with Adaptive Feature Specification [48.27684487597968]
Focus Instruction Tuning (FIT) trains large language models to condition their responses by focusing on specific features whilst ignoring others, leading to different behaviours based on what features are specified.<n>We demonstrate that FIT successfully steers behaviour at inference time; (ii) increases robustness by amplifying core task signals and down-weighting spurious cues; (iii) mitigates social bias by suppressing demographic attributes; and (iv) generalises under distribution shifts and to previously unseen focus features.
arXiv Detail & Related papers (2024-10-30T12:01:48Z) - AggSS: An Aggregated Self-Supervised Approach for Class-Incremental Learning [17.155759991260094]
This paper investigates the impact of self-supervised learning, specifically image rotations, on various class-incremental learning paradigms.
We observe a shift in the deep neural network's attention towards intrinsic object features as it learns through AggSS strategy.
AggSS serves as a plug-and-play module that can be seamlessly incorporated into any class-incremental learning framework.
arXiv Detail & Related papers (2024-08-08T10:16:02Z) - Iterative Feature Boosting for Explainable Speech Emotion Recognition [17.568724398229232]
We present a new supervised SER method based on an efficient feature engineering approach.
We pay particular attention to the explainability of results to evaluate feature relevance and refine feature sets.
The proposed method outperforms human-level performance (HLP) and state-of-the-art machine learning methods in emotion recognition on the TESS dataset.
arXiv Detail & Related papers (2024-05-30T15:44:27Z) - Half-Space Feature Learning in Neural Networks [2.3249139042158853]
There currently exist two extreme viewpoints for neural network feature learning.
We argue neither interpretation is likely to be correct based on a novel viewpoint.
We use this alternate interpretation to motivate a model, called the Deep Linearly Gated Network (DLGN)
arXiv Detail & Related papers (2024-04-05T12:03:19Z) - Task adaption by biologically inspired stochastic comodulation [8.59194778459436]
We show that fine-tuning convolutional networks by gain modulation improves on deterministic gain modulation.
Our results suggest that comodulation representations can enhance learning efficiency and performance in multi-task learning.
arXiv Detail & Related papers (2023-11-25T15:21:03Z) - Augmenting Unsupervised Reinforcement Learning with Self-Reference [63.68018737038331]
Humans possess the ability to draw on past experiences explicitly when learning new tasks.
We propose the Self-Reference (SR) approach, an add-on module explicitly designed to leverage historical information.
Our approach achieves state-of-the-art results in terms of Interquartile Mean (IQM) performance and Optimality Gap reduction on the Unsupervised Reinforcement Learning Benchmark.
arXiv Detail & Related papers (2023-11-16T09:07:34Z) - How Graph Neural Networks Learn: Lessons from Training Dynamics [80.41778059014393]
We study the training dynamics in function space of graph neural networks (GNNs)
We find that the gradient descent optimization of GNNs implicitly leverages the graph structure to update the learned function.
This finding offers new interpretable insights into when and why the learned GNN functions generalize.
arXiv Detail & Related papers (2023-10-08T10:19:56Z) - Graph Neural Networks Provably Benefit from Structural Information: A
Feature Learning Perspective [53.999128831324576]
Graph neural networks (GNNs) have pioneered advancements in graph representation learning.
This study investigates the role of graph convolution within the context of feature learning theory.
arXiv Detail & Related papers (2023-06-24T10:21:11Z) - ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models [9.96121040675476]
This manuscript explores how properties of functions learned by neural networks of depth greater than two layers affect predictions.
Our framework considers a family of networks of varying depths that all have the same capacity but different representation costs.
arXiv Detail & Related papers (2023-05-24T22:10:12Z) - Do deep neural networks have an inbuilt Occam's razor? [1.1470070927586016]
We show that structured data combined with an intrinsic Occam's razor-like inductive bias towards simple functions counteracts the exponential growth of functions with complexity.
This analysis reveals that structured data, combined with an intrinsic Occam's razor-like inductive bias towards (Kolmogorov) simple functions that is strong enough to counteract the exponential growth of functions with complexity, is a key to the success of DNNs.
arXiv Detail & Related papers (2023-04-13T16:58:21Z) - Offline Reinforcement Learning with Differentiable Function
Approximation is Provably Efficient [65.08966446962845]
offline reinforcement learning, which aims at optimizing decision-making strategies with historical data, has been extensively applied in real-life applications.
We take a step by considering offline reinforcement learning with differentiable function class approximation (DFA)
Most importantly, we show offline differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning algorithm.
arXiv Detail & Related papers (2022-10-03T07:59:42Z) - What Can Be Learnt With Wide Convolutional Neural Networks? [69.55323565255631]
We study infinitely-wide deep CNNs in the kernel regime.
We prove that deep CNNs adapt to the spatial scale of the target function.
We conclude by computing the generalisation error of a deep CNN trained on the output of another deep CNN.
arXiv Detail & Related papers (2022-08-01T17:19:32Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Deep Neural Network Classifier for Multi-dimensional Functional Data [4.340040784481499]
We propose a new approach, called as functional deep neural network (FDNN), for classifying multi-dimensional functional data.
Specifically, a deep neural network is trained based on the principle components of the training data which shall be used to predict the class label of a future data function.
arXiv Detail & Related papers (2022-05-17T19:22:48Z) - The merged-staircase property: a necessary and nearly sufficient condition for SGD learning of sparse functions on two-layer neural networks [19.899987851661354]
We study SGD-learnability with $O(d)$ sample complexity in a large ambient dimension.
Our main results characterize a hierarchical property, the "merged-staircase property", that is both necessary and nearly sufficient for learning in this setting.
Key tools are a new "dimension-free" dynamics approximation that applies to functions defined on a latent low-dimensional subspace.
arXiv Detail & Related papers (2022-02-17T13:43:06Z) - Adaptive Hierarchical Similarity Metric Learning with Noisy Labels [138.41576366096137]
We propose an Adaptive Hierarchical Similarity Metric Learning method.
It considers two noise-insensitive information, textiti.e., class-wise divergence and sample-wise consistency.
Our method achieves state-of-the-art performance compared with current deep metric learning approaches.
arXiv Detail & Related papers (2021-10-29T02:12:18Z) - Redundant representations help generalization in wide neural networks [71.38860635025907]
We study the last hidden layer representations of various state-of-the-art convolutional neural networks.
We find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise.
arXiv Detail & Related papers (2021-06-07T10:18:54Z) - Rank-R FNN: A Tensor-Based Learning Model for High-Order Data
Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters.
First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension.
We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.