Neural Collapse Beyond the Unconstrained Features Model: Landscape, Dynamics, and Generalization in the Mean-Field Regime
- URL: http://arxiv.org/abs/2501.19104v2
- Date: Tue, 04 Feb 2025 12:16:07 GMT
- Title: Neural Collapse Beyond the Unconstrained Features Model: Landscape, Dynamics, and Generalization in the Mean-Field Regime
- Authors: Diyuan Wu, Marco Mondelli,
- Abstract summary: Neural Collapse is a phenomenon where the last-layer representations of a well-trained neural network converge to a highly structured geometry.
In this paper, we focus on its first (and most basic) property, known as NC1.
We analyze NC1 in a three-layer neural network, with the first two layers operating in the mean-field regime and followed by a linear layer.
- Score: 17.726510286383697
- License:
- Abstract: Neural Collapse is a phenomenon where the last-layer representations of a well-trained neural network converge to a highly structured geometry. In this paper, we focus on its first (and most basic) property, known as NC1: the within-class variability vanishes. While prior theoretical studies establish the occurrence of NC1 via the data-agnostic unconstrained features model, our work adopts a data-specific perspective, analyzing NC1 in a three-layer neural network, with the first two layers operating in the mean-field regime and followed by a linear layer. In particular, we establish a fundamental connection between NC1 and the loss landscape: we prove that points with small empirical loss and gradient norm (thus, close to being stationary) approximately satisfy NC1, and the closeness to NC1 is controlled by the residual loss and gradient norm. We then show that (i) gradient flow on the mean squared error converges to NC1 solutions with small empirical loss, and (ii) for well-separated data distributions, both NC1 and vanishing test loss are achieved simultaneously. This aligns with the empirical observation that NC1 emerges during training while models attain near-zero test error. Overall, our results demonstrate that NC1 arises from gradient training due to the properties of the loss landscape, and they show the co-occurrence of NC1 and small test error for certain data distributions.
Related papers
- Beyond Unconstrained Features: Neural Collapse for Shallow Neural Networks with General Data [0.8594140167290099]
Neural collapse (NC) is a phenomenon that emerges at the terminal phase of the training of deep neural networks (DNNs)
We provide a complete characterization of when the NC occurs for two or three-layer neural networks.
arXiv Detail & Related papers (2024-09-03T12:30:21Z) - Kernel vs. Kernel: Exploring How the Data Structure Affects Neural Collapse [9.975341265604577]
"Neural Collapse" is the decrease in the within class variability of the network's deepest features, dubbed as NC1.
We provide a kernel-based analysis that does not suffer from this limitation.
We show that the NTK does not represent more collapsed features than the NNGP for prototypical data models.
arXiv Detail & Related papers (2024-06-04T08:33:56Z) - Supervised Contrastive Representation Learning: Landscape Analysis with
Unconstrained Features [33.703796571991745]
Recent findings reveal that overparameterized deep neural networks, trained beyond zero training, exhibit a distinctive structural pattern at the final layer.
These results indicate that the final-layer outputs in such networks display minimal within-class variations.
arXiv Detail & Related papers (2024-02-29T06:02:45Z) - Towards Demystifying the Generalization Behaviors When Neural Collapse
Emerges [132.62934175555145]
Neural Collapse (NC) is a well-known phenomenon of deep neural networks in the terminal phase of training (TPT)
We propose a theoretical explanation for why continuing training can still lead to accuracy improvement on test set, even after the train accuracy has reached 100%.
We refer to this newly discovered property as "non-conservative generalization"
arXiv Detail & Related papers (2023-10-12T14:29:02Z) - Deep Neural Collapse Is Provably Optimal for the Deep Unconstrained
Features Model [21.79259092920587]
We show that in a deep unconstrained features model, the unique global optimum for binary classification exhibits all the properties typical of deep neural collapse (DNC)
We also empirically show that (i) by optimizing deep unconstrained features models via gradient descent, the resulting solution agrees well with our theory, and (ii) trained networks recover the unconstrained features suitable for DNC.
arXiv Detail & Related papers (2023-05-22T15:51:28Z) - Neural Collapse in Deep Linear Networks: From Balanced to Imbalanced
Data [12.225207401994737]
We show that complex systems with massive amounts of parameters exhibit the same structural properties when training until convergence.
In particular, it has been observed that the last-layer features collapse to their class-means.
Our results demonstrate the convergence of the last-layer features and classifiers to a geometry consisting of vectors.
arXiv Detail & Related papers (2023-01-01T16:29:56Z) - Improved Convergence Guarantees for Shallow Neural Networks [91.3755431537592]
We prove convergence of depth 2 neural networks, trained via gradient descent, to a global minimum.
Our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances, adversarial labels.
They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the NTK regime''
arXiv Detail & Related papers (2022-12-05T14:47:52Z) - Generalization Guarantee of Training Graph Convolutional Networks with
Graph Topology Sampling [83.77955213766896]
Graph convolutional networks (GCNs) have recently achieved great empirical success in learning graphstructured data.
To address its scalability issue, graph topology sampling has been proposed to reduce the memory and computational cost of training Gs.
This paper provides first theoretical justification of graph topology sampling in training (up to) three-layer GCNs.
arXiv Detail & Related papers (2022-07-07T21:25:55Z) - On Feature Learning in Neural Networks with Global Convergence
Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF)
We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF.
We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z) - Extended Unconstrained Features Model for Exploring Deep Neural Collapse [59.59039125375527]
Recently, a phenomenon termed "neural collapse" (NC) has been empirically observed in deep neural networks.
Recent papers have shown that minimizers with this structure emerge when optimizing a simplified "unconstrained features model"
In this paper, we study the UFM for the regularized MSE loss, and show that the minimizers' features can be more structured than in the cross-entropy case.
arXiv Detail & Related papers (2022-02-16T14:17:37Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.