Spring-block theory of feature learning in deep neural networks
- URL: http://arxiv.org/abs/2407.19353v4
- Date: Fri, 27 Jun 2025 14:20:12 GMT
- Title: Spring-block theory of feature learning in deep neural networks
- Authors: Cheng Shi, Liming Pan, Ivan Dokmanić,
- Abstract summary: We show how feature-learning deep nets collapse data to a regular low-dimensional geometry.<n>We propose a macroscopic mechanical theory that reproduces the diagram and links feature learning across layers to generalization.
- Score: 11.396919965037636
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Feature-learning deep nets progressively collapse data to a regular low-dimensional geometry. How this emerges from the collective action of nonlinearity, noise, learning rate, and other factors, has eluded first-principles theories built from microscopic neuronal dynamics. We exhibit a noise-nonlinearity phase diagram that identifies regimes where shallow or deep layers learn more effectively and propose a macroscopic mechanical theory that reproduces the diagram and links feature learning across layers to generalization.
Related papers
- Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking) [0.8130739369606821]
In machine learning, layerwise linear models act as simplified representations of neural network dynamics.
These models follow the dynamical feedback principle, which describes how layers mutually govern and amplify each other's evolution.
arXiv Detail & Related papers (2025-02-28T12:52:11Z) - Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network.
Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z) - From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks [47.13391046553908]
In artificial networks, the effectiveness of these models relies on their ability to build task specific representation.
Prior studies highlight that different initializations can place networks in either a lazy regime, where representations remain static, or a rich/feature learning regime, where representations evolve dynamically.
These solutions capture the evolution of representations and the Neural Kernel across the spectrum from the rich to the lazy regimes.
arXiv Detail & Related papers (2024-09-22T23:19:04Z) - Understanding Deep Learning via Notions of Rank [5.439020425819001]
This thesis puts forth notions of rank as key for developing a theory of deep learning.<n>In particular, we establish that gradient-based training can induce an implicit regularization towards low rank for several neural network architectures.<n> Practical implications of our theory for designing explicit regularization schemes and data preprocessing algorithms are presented.
arXiv Detail & Related papers (2024-08-04T18:47:55Z) - The Impact of Geometric Complexity on Neural Collapse in Transfer Learning [6.554326244334867]
Flatness of the loss surface and neural collapse have recently emerged as useful pre-training metrics.<n>We show through experiments and theory that mechanisms which affect the geometric complexity of the pre-trained network also influence the neural collapse.
arXiv Detail & Related papers (2024-05-24T16:52:09Z) - An effective theory of collective deep learning [1.3812010983144802]
We introduce a minimal model that condenses several recent decentralized algorithms.
We derive an effective theory for linear networks to show that the coarse-grained behavior of our system is equivalent to a deformed Ginzburg-Landau model.
We validate the theory in coupled ensembles of realistic neural networks trained on the MNIST dataset.
arXiv Detail & Related papers (2023-10-19T14:58:20Z) - Geometric Knowledge Distillation: Topology Compression for Graph Neural
Networks [80.8446673089281]
We study a new paradigm of knowledge transfer that aims at encoding graph topological information into graph neural networks (GNNs)
We propose Neural Heat Kernel (NHK) to encapsulate the geometric property of the underlying manifold concerning the architecture of GNNs.
A fundamental and principled solution is derived by aligning NHKs on teacher and student models, dubbed as Geometric Knowledge Distillation.
arXiv Detail & Related papers (2022-10-24T08:01:58Z) - Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers.
It is an instance of a key structural condition that applies across broad domains of machine learning.
For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z) - Credit Assignment in Neural Networks through Deep Feedback Control [59.14935871979047]
Deep Feedback Control (DFC) is a new learning method that uses a feedback controller to drive a deep neural network to match a desired output target and whose control signal can be used for credit assignment.
The resulting learning rule is fully local in space and time and approximates Gauss-Newton optimization for a wide range of connectivity patterns.
To further underline its biological plausibility, we relate DFC to a multi-compartment model of cortical pyramidal neurons with a local voltage-dependent synaptic plasticity rule, consistent with recent theories of dendritic processing.
arXiv Detail & Related papers (2021-06-15T05:30:17Z) - What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization.
We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks.
Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - Learning without gradient descent encoded by the dynamics of a
neurobiological model [7.952666139462592]
We introduce a conceptual approach to machine learning that takes advantage of a neurobiologically derived model of dynamic signaling.
We show that MNIST images can be uniquely encoded and classified by the dynamics of geometric networks with nearly state-of-the-art accuracy in an unsupervised way.
arXiv Detail & Related papers (2021-03-16T07:03:04Z) - Developing Constrained Neural Units Over Time [81.19349325749037]
This paper focuses on an alternative way of defining Neural Networks, that is different from the majority of existing approaches.
The structure of the neural architecture is defined by means of a special class of constraints that are extended also to the interaction with data.
The proposed theory is cast into the time domain, in which data are presented to the network in an ordered manner.
arXiv Detail & Related papers (2020-09-01T09:07:25Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Mastering high-dimensional dynamics with Hamiltonian neural networks [0.0]
A map building perspective elucidates the superiority of Hamiltonian neural networks over conventional neural networks.
The results clarify the critical relation between data, dimension, and neural network learning performance.
arXiv Detail & Related papers (2020-07-28T21:14:42Z) - An analytic theory of shallow networks dynamics for hinge loss
classification [14.323962459195771]
We study the training dynamics of a simple type of neural network: a single hidden layer trained to perform a classification task.
We specialize our theory to the prototypical case of a linearly separable dataset and a linear hinge loss.
This allow us to address in a simple setting several phenomena appearing in modern networks such as slowing down of training dynamics, crossover between rich and lazy learning, and overfitting.
arXiv Detail & Related papers (2020-06-19T16:25:29Z) - Hyperbolic Neural Networks++ [66.16106727715061]
We generalize the fundamental components of neural networks in a single hyperbolic geometry model, namely, the Poincar'e ball model.
Experiments show the superior parameter efficiency of our methods compared to conventional hyperbolic components, and stability and outperformance over their Euclidean counterparts.
arXiv Detail & Related papers (2020-06-15T08:23:20Z) - Emergence of Network Motifs in Deep Neural Networks [0.35911228556176483]
We show that network science tools can be successfully applied to the study of artificial neural networks.
In particular, we study the emergence of network motifs in multi-layer perceptrons.
arXiv Detail & Related papers (2019-12-27T17:05:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.