Neural Collapse versus Low-rank Bias: Is Deep Neural Collapse Really Optimal?
- URL: http://arxiv.org/abs/2405.14468v2
- Date: Mon, 21 Oct 2024 11:54:16 GMT
- Title: Neural Collapse versus Low-rank Bias: Is Deep Neural Collapse Really Optimal?
- Authors: Peter Súkeník, Marco Mondelli, Christoph Lampert,
- Abstract summary: Deep neural networks (DNNs) exhibit a surprising structure in their final layer known as neural collapse (NC)
We focus on non-linear models of arbitrary depth in multi-class classification and reveal a surprising qualitative shift.
The main culprit is a low-rank bias of multi-layer regularization schemes.
- Score: 21.05674840609307
- License:
- Abstract: Deep neural networks (DNNs) exhibit a surprising structure in their final layer known as neural collapse (NC), and a growing body of works has currently investigated the propagation of neural collapse to earlier layers of DNNs -- a phenomenon called deep neural collapse (DNC). However, existing theoretical results are restricted to special cases: linear models, only two layers or binary classification. In contrast, we focus on non-linear models of arbitrary depth in multi-class classification and reveal a surprising qualitative shift. As soon as we go beyond two layers or two classes, DNC stops being optimal for the deep unconstrained features model (DUFM) -- the standard theoretical framework for the analysis of collapse. The main culprit is a low-rank bias of multi-layer regularization schemes: this bias leads to optimal solutions of even lower rank than the neural collapse. We support our theoretical findings with experiments on both DUFM and real data, which show the emergence of the low-rank structure in the solution found by gradient descent.
Related papers
- The Persistence of Neural Collapse Despite Low-Rank Bias: An Analytic Perspective Through Unconstrained Features [0.0]
Deep neural networks exhibit a simple structure in their final layer features and weights, commonly referred to as neural collapse.
Recent findings indicate that such a structure is generally not optimal in the deep unconstrained feature model.
This is attributed to a low-rank bias induced by regularization, which favors solutions with lower-rank than those typically associated with deep neural collapse.
arXiv Detail & Related papers (2024-10-30T16:20:39Z) - Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse [32.06666853127924]
Deep neural networks (DNNs) at convergence consistently represent the training data in the last layer via a symmetric geometric structure referred to as neural collapse.
Here, the features of the penultimate layer are free variables, which makes the model data-agnostic and, hence, puts into question its ability to capture training.
We first prove generic guarantees on neural collapse that assume (i) low training error and balancedness of the linear layers, and (ii) bounded conditioning of the features before the linear part.
arXiv Detail & Related papers (2024-10-07T10:16:40Z) - Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence.
We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers.
This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z) - A Neural Collapse Perspective on Feature Evolution in Graph Neural
Networks [44.31777384413466]
Graph neural networks (GNNs) have become increasingly popular for classification tasks on graph-structured data.
In this paper, we focus on node-wise classification and explore the feature evolution through the lens of the "Neural Collapse" phenomenon.
We show that even an "optimistic" mathematical model requires that the graphs obey a strict structural condition in order to possess a minimizer with exact collapse.
arXiv Detail & Related papers (2023-07-04T23:03:21Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Deep Neural Collapse Is Provably Optimal for the Deep Unconstrained
Features Model [21.79259092920587]
We show that in a deep unconstrained features model, the unique global optimum for binary classification exhibits all the properties typical of deep neural collapse (DNC)
We also empirically show that (i) by optimizing deep unconstrained features models via gradient descent, the resulting solution agrees well with our theory, and (ii) trained networks recover the unconstrained features suitable for DNC.
arXiv Detail & Related papers (2023-05-22T15:51:28Z) - Benign Overfitting for Two-layer ReLU Convolutional Neural Networks [60.19739010031304]
We establish algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise.
We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk.
arXiv Detail & Related papers (2023-03-07T18:59:38Z) - Neural Collapse: A Review on Modelling Principles and Generalization [0.0]
Neural collapse essentially represents a state at which the within-class variability of final hidden layer outputs is infinitesimally small.
Despite the simplicity of this state, the dynamics and implications of reaching it are yet to be fully understood.
arXiv Detail & Related papers (2022-06-08T17:55:28Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - LocalDrop: A Hybrid Regularization for Deep Neural Networks [98.30782118441158]
We propose a new approach for the regularization of neural networks by the local Rademacher complexity called LocalDrop.
A new regularization function for both fully-connected networks (FCNs) and convolutional neural networks (CNNs) has been developed based on the proposed upper bound of the local Rademacher complexity.
arXiv Detail & Related papers (2021-03-01T03:10:11Z) - Revealing the Structure of Deep Neural Networks via Convex Duality [70.15611146583068]
We study regularized deep neural networks (DNNs) and introduce a convex analytic framework to characterize the structure of hidden layers.
We show that a set of optimal hidden layer weights for a norm regularized training problem can be explicitly found as the extreme points of a convex set.
We apply the same characterization to deep ReLU networks with whitened data and prove the same weight alignment holds.
arXiv Detail & Related papers (2020-02-22T21:13:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.