Neural Collapse: A Review on Modelling Principles and Generalization
- URL: http://arxiv.org/abs/2206.04041v2
- Date: Tue, 11 Apr 2023 06:11:14 GMT
- Title: Neural Collapse: A Review on Modelling Principles and Generalization
- Authors: Vignesh Kothapalli
- Abstract summary: Neural collapse essentially represents a state at which the within-class variability of final hidden layer outputs is infinitesimally small.
Despite the simplicity of this state, the dynamics and implications of reaching it are yet to be fully understood.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep classifier neural networks enter the terminal phase of training (TPT)
when training error reaches zero and tend to exhibit intriguing Neural Collapse
(NC) properties. Neural collapse essentially represents a state at which the
within-class variability of final hidden layer outputs is infinitesimally small
and their class means form a simplex equiangular tight frame. This simplifies
the last layer behaviour to that of a nearest-class center decision rule.
Despite the simplicity of this state, the dynamics and implications of reaching
it are yet to be fully understood. In this work, we review the principles which
aid in modelling neural collapse, followed by the implications of this state on
generalization and transfer learning capabilities of neural networks. Finally,
we conclude by discussing potential avenues and directions for future research.
Related papers
- Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse [32.06666853127924]
Deep neural networks (DNNs) at convergence consistently represent the training data in the last layer via a symmetric geometric structure referred to as neural collapse.
Here, the features of the penultimate layer are free variables, which makes the model data-agnostic and, hence, puts into question its ability to capture training.
We first prove generic guarantees on neural collapse that assume (i) low training error and balancedness of the linear layers, and (ii) bounded conditioning of the features before the linear part.
arXiv Detail & Related papers (2024-10-07T10:16:40Z) - Simple and Effective Transfer Learning for Neuro-Symbolic Integration [50.592338727912946]
A potential solution to this issue is Neuro-Symbolic Integration (NeSy), where neural approaches are combined with symbolic reasoning.
Most of these methods exploit a neural network to map perceptions to symbols and a logical reasoner to predict the output of the downstream task.
They suffer from several issues, including slow convergence, learning difficulties with complex perception tasks, and convergence to local minima.
This paper proposes a simple yet effective method to ameliorate these problems.
arXiv Detail & Related papers (2024-02-21T15:51:01Z) - On the Robustness of Neural Collapse and the Neural Collapse of Robustness [6.227447957721122]
Neural Collapse refers to the curious phenomenon in the end of training of a neural network, where feature vectors and classification weights converge to a very simple geometrical arrangement (a simplex)
We study the stability properties of these simplices, and find that the simplex structure disappears under small adversarial attacks.
We identify novel properties of both robust and non-robust machine learning models, and show that earlier, unlike later layers maintain reliable simplices on perturbed data.
arXiv Detail & Related papers (2023-11-13T16:18:58Z) - Benign Overfitting for Two-layer ReLU Convolutional Neural Networks [60.19739010031304]
We establish algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise.
We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk.
arXiv Detail & Related papers (2023-03-07T18:59:38Z) - Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers.
It is an instance of a key structural condition that applies across broad domains of machine learning.
For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z) - Limitations of Neural Collapse for Understanding Generalization in Deep
Learning [25.48346719747956]
Recent work of Papyan, Han, & Donoho presented an intriguing "Neural Collapse" phenomenon.
Our motivation is to study the upper limits of this research program.
arXiv Detail & Related papers (2022-02-17T00:20:12Z) - An Unconstrained Layer-Peeled Perspective on Neural Collapse [20.75423143311858]
We introduce a surrogate model called the unconstrained layer-peeled model (ULPM)
We prove that gradient flow on this model converges to critical points of a minimum-norm separation problem exhibiting neural collapse in its global minimizer.
We show that our results also hold during the training of neural networks in real-world tasks when explicit regularization or weight decay is not used.
arXiv Detail & Related papers (2021-10-06T14:18:47Z) - Redundant representations help generalization in wide neural networks [71.38860635025907]
We study the last hidden layer representations of various state-of-the-art convolutional neural networks.
We find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise.
arXiv Detail & Related papers (2021-06-07T10:18:54Z) - Exploring Deep Neural Networks via Layer-Peeled Model: Minority Collapse
in Imbalanced Training [39.137793683411424]
We introduce the textitLayer-Peeled Model, a non-yet analytically tractable optimization program.
We show that the model inherits many characteristics of well-trained networks, thereby offering an effective tool for explaining and predicting common empirical patterns of deep learning training.
In particular, we show that the model reveals a hitherto unknown phenomenon that we term textitMinority Collapse, which fundamentally limits the performance of deep learning models on the minority classes.
arXiv Detail & Related papers (2021-01-29T17:37:17Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.