The Persistence of Neural Collapse Despite Low-Rank Bias: An Analytic Perspective Through Unconstrained Features
- URL: http://arxiv.org/abs/2410.23169v1
- Date: Wed, 30 Oct 2024 16:20:39 GMT
- Title: The Persistence of Neural Collapse Despite Low-Rank Bias: An Analytic Perspective Through Unconstrained Features
- Authors: Connall Garrod, Jonathan P. Keating,
- Abstract summary: Deep neural networks exhibit a simple structure in their final layer features and weights, commonly referred to as neural collapse.
Recent findings indicate that such a structure is generally not optimal in the deep unconstrained feature model.
This is attributed to a low-rank bias induced by regularization, which favors solutions with lower-rank than those typically associated with deep neural collapse.
- Score: 0.0
- License:
- Abstract: Modern deep neural networks have been observed to exhibit a simple structure in their final layer features and weights, commonly referred to as neural collapse. This phenomenon has also been noted in layers beyond the final one, an extension known as deep neural collapse. Recent findings indicate that such a structure is generally not optimal in the deep unconstrained feature model, an approximation of an expressive network. This is attributed to a low-rank bias induced by regularization, which favors solutions with lower-rank than those typically associated with deep neural collapse. In this work, we extend these observations to the cross-entropy loss and analyze how the low-rank bias influences various solutions. Additionally, we explore how this bias induces specific structures in the singular values of the weights at global optima. Furthermore, we examine the loss surface of these models and provide evidence that the frequent observation of deep neural collapse in practice, despite its suboptimality, may result from its higher degeneracy on the loss surface.
Related papers
- Neural Collapse versus Low-rank Bias: Is Deep Neural Collapse Really Optimal? [21.05674840609307]
Deep neural networks (DNNs) exhibit a surprising structure in their final layer known as neural collapse (NC)
We focus on non-linear models of arbitrary depth in multi-class classification and reveal a surprising qualitative shift.
The main culprit is a low-rank bias of multi-layer regularization schemes.
arXiv Detail & Related papers (2024-05-23T11:55:49Z) - Unifying Low Dimensional Observations in Deep Learning Through the Deep Linear Unconstrained Feature Model [0.0]
We study low-dimensional structure in the weights, Hessian's, gradients, and feature vectors of deep neural networks.
We show how they can be unified within a generalized unconstrained feature model.
arXiv Detail & Related papers (2024-04-09T08:17:32Z) - Neural Rank Collapse: Weight Decay and Small Within-Class Variability
Yield Low-Rank Bias [4.829265670567825]
We show the presence of an intriguing neural rank collapse phenomenon, connecting the low-rank bias of trained networks with networks' neural collapse properties.
As the weight decay parameter grows, the rank of each layer in the network decreases proportionally to the within-class variability of the hidden-space embeddings of the previous layers.
arXiv Detail & Related papers (2024-02-06T13:44:39Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - On the Robustness of Neural Collapse and the Neural Collapse of Robustness [6.227447957721122]
Neural Collapse refers to the curious phenomenon in the end of training of a neural network, where feature vectors and classification weights converge to a very simple geometrical arrangement (a simplex)
We study the stability properties of these simplices, and find that the simplex structure disappears under small adversarial attacks.
We identify novel properties of both robust and non-robust machine learning models, and show that earlier, unlike later layers maintain reliable simplices on perturbed data.
arXiv Detail & Related papers (2023-11-13T16:18:58Z) - Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence.
We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers.
This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z) - Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures [93.17009514112702]
Pruning, setting a significant subset of the parameters of a neural network to zero, is one of the most popular methods of model compression.
Despite existing evidence for this phenomenon, the relationship between neural network pruning and induced bias is not well-understood.
arXiv Detail & Related papers (2023-04-25T07:42:06Z) - An Unconstrained Layer-Peeled Perspective on Neural Collapse [20.75423143311858]
We introduce a surrogate model called the unconstrained layer-peeled model (ULPM)
We prove that gradient flow on this model converges to critical points of a minimum-norm separation problem exhibiting neural collapse in its global minimizer.
We show that our results also hold during the training of neural networks in real-world tasks when explicit regularization or weight decay is not used.
arXiv Detail & Related papers (2021-10-06T14:18:47Z) - The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer
Linear Networks [51.1848572349154]
neural network models that perfectly fit noisy data can generalize well to unseen test data.
We consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk.
arXiv Detail & Related papers (2021-08-25T22:01:01Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z) - Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective.
We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.