Limitations of Neural Collapse for Understanding Generalization in Deep
Learning
- URL: http://arxiv.org/abs/2202.08384v1
- Date: Thu, 17 Feb 2022 00:20:12 GMT
- Title: Limitations of Neural Collapse for Understanding Generalization in Deep
Learning
- Authors: Like Hui, Mikhail Belkin, Preetum Nakkiran
- Abstract summary: Recent work of Papyan, Han, & Donoho presented an intriguing "Neural Collapse" phenomenon.
Our motivation is to study the upper limits of this research program.
- Score: 25.48346719747956
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent work of Papyan, Han, & Donoho (2020) presented an intriguing
"Neural Collapse" phenomenon, showing a structural property of interpolating
classifiers in the late stage of training. This opened a rich area of
exploration studying this phenomenon. Our motivation is to study the upper
limits of this research program: How far will understanding Neural Collapse
take us in understanding deep learning? First, we investigate its role in
generalization. We refine the Neural Collapse conjecture into two separate
conjectures: collapse on the train set (an optimization property) and collapse
on the test distribution (a generalization property). We find that while Neural
Collapse often occurs on the train set, it does not occur on the test set. We
thus conclude that Neural Collapse is primarily an optimization phenomenon,
with as-yet-unclear connections to generalization. Second, we investigate the
role of Neural Collapse in feature learning. We show simple, realistic
experiments where training longer leads to worse last-layer features, as
measured by transfer-performance on a downstream task. This suggests that
neural collapse is not always desirable for representation learning, as
previously claimed. Finally, we give preliminary evidence of a "cascading
collapse" phenomenon, wherein some form of Neural Collapse occurs not only for
the last layer, but in earlier layers as well. We hope our work encourages the
community to continue the rich line of Neural Collapse research, while also
considering its inherent limitations.
Related papers
- Neural Collapse versus Low-rank Bias: Is Deep Neural Collapse Really Optimal? [21.05674840609307]
Deep neural networks (DNNs) exhibit a surprising structure in their final layer known as neural collapse (NC)
We focus on non-linear models of arbitrary depth in multi-class classification and reveal a surprising qualitative shift.
The main culprit is a low-rank bias of multi-layer regularization schemes.
arXiv Detail & Related papers (2024-05-23T11:55:49Z) - Navigate Beyond Shortcuts: Debiased Learning through the Lens of Neural Collapse [19.279084204631204]
We extend the investigation of Neural Collapse to the biased datasets with imbalanced attributes.
We propose an avoid-shortcut learning framework without additional training complexity.
With well-designed shortcut primes based on Neural Collapse structure, the models are encouraged to skip the pursuit of simple shortcuts.
arXiv Detail & Related papers (2024-05-09T07:23:37Z) - Simple and Effective Transfer Learning for Neuro-Symbolic Integration [50.592338727912946]
A potential solution to this issue is Neuro-Symbolic Integration (NeSy), where neural approaches are combined with symbolic reasoning.
Most of these methods exploit a neural network to map perceptions to symbols and a logical reasoner to predict the output of the downstream task.
They suffer from several issues, including slow convergence, learning difficulties with complex perception tasks, and convergence to local minima.
This paper proposes a simple yet effective method to ameliorate these problems.
arXiv Detail & Related papers (2024-02-21T15:51:01Z) - On the Robustness of Neural Collapse and the Neural Collapse of Robustness [6.227447957721122]
Neural Collapse refers to the curious phenomenon in the end of training of a neural network, where feature vectors and classification weights converge to a very simple geometrical arrangement (a simplex)
We study the stability properties of these simplices, and find that the simplex structure disappears under small adversarial attacks.
We identify novel properties of both robust and non-robust machine learning models, and show that earlier, unlike later layers maintain reliable simplices on perturbed data.
arXiv Detail & Related papers (2023-11-13T16:18:58Z) - Towards Demystifying the Generalization Behaviors When Neural Collapse
Emerges [132.62934175555145]
Neural Collapse (NC) is a well-known phenomenon of deep neural networks in the terminal phase of training (TPT)
We propose a theoretical explanation for why continuing training can still lead to accuracy improvement on test set, even after the train accuracy has reached 100%.
We refer to this newly discovered property as "non-conservative generalization"
arXiv Detail & Related papers (2023-10-12T14:29:02Z) - Benign Overfitting for Two-layer ReLU Convolutional Neural Networks [60.19739010031304]
We establish algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise.
We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk.
arXiv Detail & Related papers (2023-03-07T18:59:38Z) - Improved Convergence Guarantees for Shallow Neural Networks [91.3755431537592]
We prove convergence of depth 2 neural networks, trained via gradient descent, to a global minimum.
Our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances, adversarial labels.
They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the NTK regime''
arXiv Detail & Related papers (2022-12-05T14:47:52Z) - Neural Collapse: A Review on Modelling Principles and Generalization [0.0]
Neural collapse essentially represents a state at which the within-class variability of final hidden layer outputs is infinitesimally small.
Despite the simplicity of this state, the dynamics and implications of reaching it are yet to be fully understood.
arXiv Detail & Related papers (2022-06-08T17:55:28Z) - Benign Overfitting in Two-layer Convolutional Neural Networks [90.75603889605043]
We study the benign overfitting phenomenon in training a two-layer convolutional neural network (CNN)
We show that when the signal-to-noise ratio satisfies a certain condition, a two-layer CNN trained by gradient descent can achieve arbitrarily small training and test loss.
On the other hand, when this condition does not hold, overfitting becomes harmful and the obtained CNN can only achieve constant level test loss.
arXiv Detail & Related papers (2022-02-14T07:45:51Z) - Neural collapse with unconstrained features [4.941630596191806]
We propose a simple "unconstrained features model" in which neural collapse also emerges empirically.
By studying this model, we provide some explanation for the emergence of neural collapse in terms of the landscape of empirical risk.
arXiv Detail & Related papers (2020-11-23T18:49:36Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.