Deep Neural Collapse Is Provably Optimal for the Deep Unconstrained
Features Model
- URL: http://arxiv.org/abs/2305.13165v1
- Date: Mon, 22 May 2023 15:51:28 GMT
- Title: Deep Neural Collapse Is Provably Optimal for the Deep Unconstrained
Features Model
- Authors: Peter S\'uken\'ik, Marco Mondelli, Christoph Lampert
- Abstract summary: We show that in a deep unconstrained features model, the unique global optimum for binary classification exhibits all the properties typical of deep neural collapse (DNC)
We also empirically show that (i) by optimizing deep unconstrained features models via gradient descent, the resulting solution agrees well with our theory, and (ii) trained networks recover the unconstrained features suitable for DNC.
- Score: 21.79259092920587
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural collapse (NC) refers to the surprising structure of the last layer of
deep neural networks in the terminal phase of gradient descent training.
Recently, an increasing amount of experimental evidence has pointed to the
propagation of NC to earlier layers of neural networks. However, while the NC
in the last layer is well studied theoretically, much less is known about its
multi-layered counterpart - deep neural collapse (DNC). In particular, existing
work focuses either on linear layers or only on the last two layers at the
price of an extra assumption. Our paper fills this gap by generalizing the
established analytical framework for NC - the unconstrained features model - to
multiple non-linear layers. Our key technical contribution is to show that, in
a deep unconstrained features model, the unique global optimum for binary
classification exhibits all the properties typical of DNC. This explains the
existing experimental evidence of DNC. We also empirically show that (i) by
optimizing deep unconstrained features models via gradient descent, the
resulting solution agrees well with our theory, and (ii) trained networks
recover the unconstrained features suitable for the occurrence of DNC, thus
supporting the validity of this modeling principle.
Related papers
- Neural Collapse versus Low-rank Bias: Is Deep Neural Collapse Really Optimal? [21.05674840609307]
Deep neural networks (DNNs) exhibit a surprising structure in their final layer known as neural collapse (NC)
We focus on non-linear models of arbitrary depth in multi-class classification and reveal a surprising qualitative shift.
The main culprit is a low-rank bias of multi-layer regularization schemes.
arXiv Detail & Related papers (2024-05-23T11:55:49Z) - Supervised Contrastive Representation Learning: Landscape Analysis with
Unconstrained Features [33.703796571991745]
Recent findings reveal that overparameterized deep neural networks, trained beyond zero training, exhibit a distinctive structural pattern at the final layer.
These results indicate that the final-layer outputs in such networks display minimal within-class variations.
arXiv Detail & Related papers (2024-02-29T06:02:45Z) - Towards Demystifying the Generalization Behaviors When Neural Collapse
Emerges [132.62934175555145]
Neural Collapse (NC) is a well-known phenomenon of deep neural networks in the terminal phase of training (TPT)
We propose a theoretical explanation for why continuing training can still lead to accuracy improvement on test set, even after the train accuracy has reached 100%.
We refer to this newly discovered property as "non-conservative generalization"
arXiv Detail & Related papers (2023-10-12T14:29:02Z) - Towards Understanding Neural Collapse: The Effects of Batch Normalization and Weight Decay [0.6813925418351435]
Neural Collapse (NC) is a geometric structure recently observed at the terminal phase of training deep neural networks.
We demonstrate that batch normalization (BN) and weight decay (WD) critically influence the emergence of NC.
Our experiments substantiate theoretical insights by showing that models demonstrate a stronger presence of NC with BN, appropriate WD values, lower loss, and lower last-layer feature norm.
arXiv Detail & Related papers (2023-09-09T00:05:45Z) - Neural Collapse in Deep Linear Networks: From Balanced to Imbalanced
Data [12.225207401994737]
We show that complex systems with massive amounts of parameters exhibit the same structural properties when training until convergence.
In particular, it has been observed that the last-layer features collapse to their class-means.
Our results demonstrate the convergence of the last-layer features and classifiers to a geometry consisting of vectors.
arXiv Detail & Related papers (2023-01-01T16:29:56Z) - On Feature Learning in Neural Networks with Global Convergence
Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF)
We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF.
We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z) - Extended Unconstrained Features Model for Exploring Deep Neural Collapse [59.59039125375527]
Recently, a phenomenon termed "neural collapse" (NC) has been empirically observed in deep neural networks.
Recent papers have shown that minimizers with this structure emerge when optimizing a simplified "unconstrained features model"
In this paper, we study the UFM for the regularized MSE loss, and show that the minimizers' features can be more structured than in the cross-entropy case.
arXiv Detail & Related papers (2022-02-16T14:17:37Z) - An Unconstrained Layer-Peeled Perspective on Neural Collapse [20.75423143311858]
We introduce a surrogate model called the unconstrained layer-peeled model (ULPM)
We prove that gradient flow on this model converges to critical points of a minimum-norm separation problem exhibiting neural collapse in its global minimizer.
We show that our results also hold during the training of neural networks in real-world tasks when explicit regularization or weight decay is not used.
arXiv Detail & Related papers (2021-10-06T14:18:47Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Layer-wise Conditioning Analysis in Exploring the Learning Dynamics of
DNNs [115.35745188028169]
We extend conditioning analysis to deep neural networks (DNNs) in order to investigate their learning dynamics.
We show that batch normalization (BN) can stabilize the training, but sometimes result in the false impression of a local minimum.
We experimentally observe that BN can improve the layer-wise conditioning of the optimization problem.
arXiv Detail & Related papers (2020-02-25T11:40:27Z) - A Generalized Neural Tangent Kernel Analysis for Two-layer Neural
Networks [87.23360438947114]
We show that noisy gradient descent with weight decay can still exhibit a " Kernel-like" behavior.
This implies that the training loss converges linearly up to a certain accuracy.
We also establish a novel generalization error bound for two-layer neural networks trained by noisy gradient descent with weight decay.
arXiv Detail & Related papers (2020-02-10T18:56:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.