Topological Invariance and Breakdown in Learning
- URL: http://arxiv.org/abs/2510.02670v1
- Date: Fri, 03 Oct 2025 02:03:16 GMT
- Title: Topological Invariance and Breakdown in Learning
- Authors: Yongyi Yang, Tomaso Poggio, Isaac Chuang, Liu Ziyin,
- Abstract summary: We prove that for a broad class of permutation-equivariant learning rules, the training process induces a bi-Lipschitz mapping between neurons.<n>With a learning rate below a topological critical point $eta*$, the training is constrained to preserve all topological structure of the neurons.<n>Above $eta*$, the learning process allows for topological simplification, making the neuron manifold progressively coarser.
- Score: 11.698788994819749
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We prove that for a broad class of permutation-equivariant learning rules (including SGD, Adam, and others), the training process induces a bi-Lipschitz mapping between neurons and strongly constrains the topology of the neuron distribution during training. This result reveals a qualitative difference between small and large learning rates $\eta$. With a learning rate below a topological critical point $\eta^*$, the training is constrained to preserve all topological structure of the neurons. In contrast, above $\eta^*$, the learning process allows for topological simplification, making the neuron manifold progressively coarser and thereby reducing the model's expressivity. Viewed in combination with the recent discovery of the edge of stability phenomenon, the learning dynamics of neuron networks under gradient descent can be divided into two phases: first they undergo smooth optimization under topological constraints, and then enter a second phase where they learn through drastic topological simplifications. A key feature of our theory is that it is independent of specific architectures or loss functions, enabling the universal application of topological methods to the study of deep learning.
Related papers
- Topological derivative approach for deep neural network architecture adaptation [0.6144680854063939]
This work presents a novel algorithm for progressively adapting neural network architecture along the depth.<n>We show that the optimality condition for the shape functional leads to an eigenvalue problem for deep neural architecture adaptation.<n>Our approach thus determines the most sensitive location along the depth where a new layer needs to be inserted.
arXiv Detail & Related papers (2025-02-08T23:01:07Z) - Topological obstruction to the training of shallow ReLU neural networks [0.0]
We study the interplay between the geometry of the loss landscape and the optimization trajectories of simple neural networks.
This paper reveals the presence of topological obstruction in the loss landscape of shallow ReLU neural networks trained using gradient flow.
arXiv Detail & Related papers (2024-10-18T19:17:48Z) - Topological Expressivity of ReLU Neural Networks [0.0]
We study the expressivity of ReLU neural networks in the setting of a binary classification problem from a topological perspective.
Results show that deep ReLU neural networks are exponentially more powerful than shallow ones in terms of topological simplification.
arXiv Detail & Related papers (2023-10-17T10:28:00Z) - Connecting NTK and NNGP: A Unified Theoretical Framework for Wide Neural Network Learning Dynamics [6.349503549199403]
We provide a comprehensive framework for the learning process of deep wide neural networks.<n>By characterizing the diffusive phase, our work sheds light on representational drift in the brain.
arXiv Detail & Related papers (2023-09-08T18:00:01Z) - Decorrelating neurons using persistence [29.25969187808722]
We present two regularisation terms computed from the weights of a minimum spanning tree of a clique.
We demonstrate that naive minimisation of all correlations between neurons obtains lower accuracies than our regularisation terms.
We include a proof of differentiability of our regularisers, thus developing the first effective topological persistence-based regularisation terms.
arXiv Detail & Related papers (2023-08-09T11:09:14Z) - A Neural Collapse Perspective on Feature Evolution in Graph Neural
Networks [44.31777384413466]
Graph neural networks (GNNs) have become increasingly popular for classification tasks on graph-structured data.
In this paper, we focus on node-wise classification and explore the feature evolution through the lens of the "Neural Collapse" phenomenon.
We show that even an "optimistic" mathematical model requires that the graphs obey a strict structural condition in order to possess a minimizer with exact collapse.
arXiv Detail & Related papers (2023-07-04T23:03:21Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
We study the phenomenon of "double descent" of the generalization error.
We find that double descent can be attributed to distinct features being learned at different scales.
arXiv Detail & Related papers (2021-12-06T18:17:08Z) - Dynamic Neural Diversification: Path to Computationally Sustainable
Neural Networks [68.8204255655161]
Small neural networks with a constrained number of trainable parameters, can be suitable resource-efficient candidates for many simple tasks.
We explore the diversity of the neurons within the hidden layer during the learning process.
We analyze how the diversity of the neurons affects predictions of the model.
arXiv Detail & Related papers (2021-09-20T15:12:16Z) - Going beyond p-convolutions to learn grayscale morphological operators [64.38361575778237]
We present two new morphological layers based on the same principle as the p-convolutional layer.
In this work, we present two new morphological layers based on the same principle as the p-convolutional layer.
arXiv Detail & Related papers (2021-02-19T17:22:16Z) - Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow.
We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.