Neural Collapse with Normalized Features: A Geometric Analysis over the
Riemannian Manifold
- URL: http://arxiv.org/abs/2209.09211v1
- Date: Mon, 19 Sep 2022 17:26:32 GMT
- Title: Neural Collapse with Normalized Features: A Geometric Analysis over the
Riemannian Manifold
- Authors: Can Yaras and Peng Wang and Zhihui Zhu and Laura Balzano and Qing Qu
- Abstract summary: When training over normalized deep networks for classification tasks, the learned features exhibit a so-called "neural collapse" phenomenon.
We show that better representations can be learned faster via feature normalization.
- Score: 30.3185037354742
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When training overparameterized deep networks for classification tasks, it
has been widely observed that the learned features exhibit a so-called "neural
collapse" phenomenon. More specifically, for the output features of the
penultimate layer, for each class the within-class features converge to their
means, and the means of different classes exhibit a certain tight frame
structure, which is also aligned with the last layer's classifier. As feature
normalization in the last layer becomes a common practice in modern
representation learning, in this work we theoretically justify the neural
collapse phenomenon for normalized features. Based on an unconstrained feature
model, we simplify the empirical loss function in a multi-class classification
task into a nonconvex optimization problem over the Riemannian manifold by
constraining all features and classifiers over the sphere. In this context, we
analyze the nonconvex landscape of the Riemannian optimization problem over the
product of spheres, showing a benign global landscape in the sense that the
only global minimizers are the neural collapse solutions while all other
critical points are strict saddles with negative curvature. Experimental
results on practical deep networks corroborate our theory and demonstrate that
better representations can be learned faster via feature normalization.
Related papers
- On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Towards Demystifying the Generalization Behaviors When Neural Collapse
Emerges [132.62934175555145]
Neural Collapse (NC) is a well-known phenomenon of deep neural networks in the terminal phase of training (TPT)
We propose a theoretical explanation for why continuing training can still lead to accuracy improvement on test set, even after the train accuracy has reached 100%.
We refer to this newly discovered property as "non-conservative generalization"
arXiv Detail & Related papers (2023-10-12T14:29:02Z) - Generalized Neural Collapse for a Large Number of Classes [33.46269920297418]
We provide empirical study to verify the occurrence of generalized neural collapse in practical deep neural networks.
We provide theoretical study to show that the generalized neural collapse provably occurs under unconstrained feature model with spherical constraint.
arXiv Detail & Related papers (2023-10-09T02:27:04Z) - A Unified Algebraic Perspective on Lipschitz Neural Networks [88.14073994459586]
This paper introduces a novel perspective unifying various types of 1-Lipschitz neural networks.
We show that many existing techniques can be derived and generalized via finding analytical solutions of a common semidefinite programming (SDP) condition.
Our approach, called SDP-based Lipschitz Layers (SLL), allows us to design non-trivial yet efficient generalization of convex potential layers.
arXiv Detail & Related papers (2023-03-06T14:31:09Z) - Neural Collapse Inspired Feature-Classifier Alignment for Few-Shot Class
Incremental Learning [120.53458753007851]
Few-shot class-incremental learning (FSCIL) has been a challenging problem as only a few training samples are accessible for each novel class in the new sessions.
We deal with this misalignment dilemma in FSCIL inspired by the recently discovered phenomenon named neural collapse.
We propose a neural collapse inspired framework for FSCIL. Experiments on the miniImageNet, CUB-200, and CIFAR-100 datasets demonstrate that our proposed framework outperforms the state-of-the-art performances.
arXiv Detail & Related papers (2023-02-06T18:39:40Z) - Do We Really Need a Learnable Classifier at the End of Deep Neural
Network? [118.18554882199676]
We study the potential of learning a neural network for classification with the classifier randomly as an ETF and fixed during training.
Our experimental results show that our method is able to achieve similar performances on image classification for balanced datasets.
arXiv Detail & Related papers (2022-03-17T04:34:28Z) - On the Optimization Landscape of Neural Collapse under MSE Loss: Global
Optimality with Unconstrained Features [38.05002597295796]
Collapselayers collapse to the vertices of a Simplex Equiangular Tight Frame (ETF)
An intriguing empirical phenomenon has been widely observed in the last-layers and features of deep neural networks for tasks.
arXiv Detail & Related papers (2022-03-02T17:00:18Z) - An Unconstrained Layer-Peeled Perspective on Neural Collapse [20.75423143311858]
We introduce a surrogate model called the unconstrained layer-peeled model (ULPM)
We prove that gradient flow on this model converges to critical points of a minimum-norm separation problem exhibiting neural collapse in its global minimizer.
We show that our results also hold during the training of neural networks in real-world tasks when explicit regularization or weight decay is not used.
arXiv Detail & Related papers (2021-10-06T14:18:47Z) - A Geometric Analysis of Neural Collapse with Unconstrained Features [40.66585948844492]
We provide the first global optimization landscape analysis of $Neural;Collapse$.
This phenomenon arises in the last-layer classifiers and features of neural networks during the terminal phase of training.
arXiv Detail & Related papers (2021-05-06T00:00:50Z) - Generalization bound of globally optimal non-convex neural network
training: Transportation map estimation by infinite dimensional Langevin
dynamics [50.83356836818667]
We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error.
Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence.
arXiv Detail & Related papers (2020-07-11T18:19:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.