Generalized Neural Collapse for a Large Number of Classes
- URL: http://arxiv.org/abs/2310.05351v3
- Date: Fri, 27 Oct 2023 14:35:14 GMT
- Title: Generalized Neural Collapse for a Large Number of Classes
- Authors: Jiachen Jiang, Jinxin Zhou, Peng Wang, Qing Qu, Dustin Mixon, Chong
You and Zhihui Zhu
- Abstract summary: We provide empirical study to verify the occurrence of generalized neural collapse in practical deep neural networks.
We provide theoretical study to show that the generalized neural collapse provably occurs under unconstrained feature model with spherical constraint.
- Score: 33.46269920297418
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural collapse provides an elegant mathematical characterization of learned
last layer representations (a.k.a. features) and classifier weights in deep
classification models. Such results not only provide insights but also motivate
new techniques for improving practical deep models. However, most of the
existing empirical and theoretical studies in neural collapse focus on the case
that the number of classes is small relative to the dimension of the feature
space. This paper extends neural collapse to cases where the number of classes
are much larger than the dimension of feature space, which broadly occur for
language models, retrieval systems, and face recognition applications. We show
that the features and classifier exhibit a generalized neural collapse
phenomenon, where the minimum one-vs-rest margins is maximized.We provide
empirical study to verify the occurrence of generalized neural collapse in
practical deep neural networks. Moreover, we provide theoretical study to show
that the generalized neural collapse provably occurs under unconstrained
feature model with spherical constraint, under certain technical conditions on
feature dimension and number of classes.
Related papers
- Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning.
Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z) - The Impact of Geometric Complexity on Neural Collapse in Transfer Learning [6.554326244334867]
Flatness of the loss surface and neural collapse have recently emerged as useful pre-training metrics.
We show through experiments and theory that mechanisms which affect the geometric complexity of the pre-trained network also influence the neural collapse.
arXiv Detail & Related papers (2024-05-24T16:52:09Z) - A singular Riemannian Geometry Approach to Deep Neural Networks III. Piecewise Differentiable Layers and Random Walks on $n$-dimensional Classes [49.32130498861987]
We study the case of non-differentiable activation functions, such as ReLU.
Two recent works introduced a geometric framework to study neural networks.
We illustrate our findings with some numerical experiments on classification of images and thermodynamic problems.
arXiv Detail & Related papers (2024-04-09T08:11:46Z) - Neural Dependencies Emerging from Learning Massive Categories [94.77992221690742]
This work presents two astonishing findings on neural networks learned for large-scale image classification.
1) Given a well-trained model, the logits predicted for some category can be directly obtained by linearly combining the predictions of a few other categories.
2) Neural dependencies exist not only within a single model, but even between two independently learned models.
arXiv Detail & Related papers (2022-11-21T09:42:15Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - Neural Collapse with Normalized Features: A Geometric Analysis over the
Riemannian Manifold [30.3185037354742]
When training over normalized deep networks for classification tasks, the learned features exhibit a so-called "neural collapse" phenomenon.
We show that better representations can be learned faster via feature normalization.
arXiv Detail & Related papers (2022-09-19T17:26:32Z) - Memorization-Dilation: Modeling Neural Collapse Under Label Noise [10.134749691813344]
During the terminal phase of training a deep neural network, the feature embedding of all examples of the same class tend to collapse to a single representation.
Empirical evidence suggests that the memorization of noisy data points leads to a degradation (dilation) of the neural collapse.
Our proofs reveal why label smoothing, a modification of cross-entropy empirically observed to produce a regularization effect, leads to improved generalization in classification tasks.
arXiv Detail & Related papers (2022-06-11T13:40:37Z) - On the Role of Neural Collapse in Transfer Learning [29.972063833424215]
Recent results show that representations learned by a single classifier over many classes are competitive on few-shot learning problems.
We show that neural collapse generalizes to new samples from the training classes, and -- more importantly -- to new classes as well.
arXiv Detail & Related papers (2021-12-30T16:36:26Z) - The Causal Neural Connection: Expressiveness, Learnability, and
Inference [125.57815987218756]
An object called structural causal model (SCM) represents a collection of mechanisms and sources of random variation of the system under investigation.
In this paper, we show that the causal hierarchy theorem (Thm. 1, Bareinboim et al., 2020) still holds for neural models.
We introduce a special type of SCM called a neural causal model (NCM), and formalize a new type of inductive bias to encode structural constraints necessary for performing causal inferences.
arXiv Detail & Related papers (2021-07-02T01:55:18Z) - Exploring Deep Neural Networks via Layer-Peeled Model: Minority Collapse
in Imbalanced Training [39.137793683411424]
We introduce the textitLayer-Peeled Model, a non-yet analytically tractable optimization program.
We show that the model inherits many characteristics of well-trained networks, thereby offering an effective tool for explaining and predicting common empirical patterns of deep learning training.
In particular, we show that the model reveals a hitherto unknown phenomenon that we term textitMinority Collapse, which fundamentally limits the performance of deep learning models on the minority classes.
arXiv Detail & Related papers (2021-01-29T17:37:17Z) - Hyperbolic Neural Networks++ [66.16106727715061]
We generalize the fundamental components of neural networks in a single hyperbolic geometry model, namely, the Poincar'e ball model.
Experiments show the superior parameter efficiency of our methods compared to conventional hyperbolic components, and stability and outperformance over their Euclidean counterparts.
arXiv Detail & Related papers (2020-06-15T08:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.