A Geometric Analysis of Neural Collapse with Unconstrained Features
- URL: http://arxiv.org/abs/2105.02375v1
- Date: Thu, 6 May 2021 00:00:50 GMT
- Title: A Geometric Analysis of Neural Collapse with Unconstrained Features
- Authors: Zhihui Zhu, Tianyu Ding, Jinxin Zhou, Xiao Li, Chong You, Jeremias
Sulam, and Qing Qu
- Abstract summary: We provide the first global optimization landscape analysis of $Neural;Collapse$.
This phenomenon arises in the last-layer classifiers and features of neural networks during the terminal phase of training.
- Score: 40.66585948844492
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We provide the first global optimization landscape analysis of
$Neural\;Collapse$ -- an intriguing empirical phenomenon that arises in the
last-layer classifiers and features of neural networks during the terminal
phase of training. As recently reported by Papyan et al., this phenomenon
implies that ($i$) the class means and the last-layer classifiers all collapse
to the vertices of a Simplex Equiangular Tight Frame (ETF) up to scaling, and
($ii$) cross-example within-class variability of last-layer activations
collapses to zero. We study the problem based on a simplified
$unconstrained\;feature\;model$, which isolates the topmost layers from the
classifier of the neural network. In this context, we show that the classical
cross-entropy loss with weight decay has a benign global landscape, in the
sense that the only global minimizers are the Simplex ETFs while all other
critical points are strict saddles whose Hessian exhibit negative curvature
directions. In contrast to existing landscape analysis for deep neural networks
which is often disconnected from practice, our analysis of the simplified model
not only does it explain what kind of features are learned in the last layer,
but it also shows why they can be efficiently optimized in the simplified
settings, matching the empirical observations in practical deep network
architectures. These findings could have profound implications for
optimization, generalization, and robustness of broad interests. For example,
our experiments demonstrate that one may set the feature dimension equal to the
number of classes and fix the last-layer classifier to be a Simplex ETF for
network training, which reduces memory cost by over $20\%$ on ResNet18 without
sacrificing the generalization performance.
Related papers
- Universal Consistency of Wide and Deep ReLU Neural Networks and Minimax
Optimal Convergence Rates for Kolmogorov-Donoho Optimal Function Classes [7.433327915285969]
We prove the universal consistency of wide and deep ReLU neural network classifiers trained on the logistic loss.
We also give sufficient conditions for a class of probability measures for which classifiers based on neural networks achieve minimax optimal rates of convergence.
arXiv Detail & Related papers (2024-01-08T23:54:46Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Neural Collapse Inspired Feature-Classifier Alignment for Few-Shot Class
Incremental Learning [120.53458753007851]
Few-shot class-incremental learning (FSCIL) has been a challenging problem as only a few training samples are accessible for each novel class in the new sessions.
We deal with this misalignment dilemma in FSCIL inspired by the recently discovered phenomenon named neural collapse.
We propose a neural collapse inspired framework for FSCIL. Experiments on the miniImageNet, CUB-200, and CIFAR-100 datasets demonstrate that our proposed framework outperforms the state-of-the-art performances.
arXiv Detail & Related papers (2023-02-06T18:39:40Z) - Neural Collapse in Deep Linear Networks: From Balanced to Imbalanced
Data [12.225207401994737]
We show that complex systems with massive amounts of parameters exhibit the same structural properties when training until convergence.
In particular, it has been observed that the last-layer features collapse to their class-means.
Our results demonstrate the convergence of the last-layer features and classifiers to a geometry consisting of vectors.
arXiv Detail & Related papers (2023-01-01T16:29:56Z) - Theoretical Characterization of How Neural Network Pruning Affects its
Generalization [131.1347309639727]
This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization.
It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero.
More surprisingly, the generalization bound gets better as the pruning fraction gets larger.
arXiv Detail & Related papers (2023-01-01T03:10:45Z) - Neural Collapse with Normalized Features: A Geometric Analysis over the
Riemannian Manifold [30.3185037354742]
When training over normalized deep networks for classification tasks, the learned features exhibit a so-called "neural collapse" phenomenon.
We show that better representations can be learned faster via feature normalization.
arXiv Detail & Related papers (2022-09-19T17:26:32Z) - Do We Really Need a Learnable Classifier at the End of Deep Neural
Network? [118.18554882199676]
We study the potential of learning a neural network for classification with the classifier randomly as an ETF and fixed during training.
Our experimental results show that our method is able to achieve similar performances on image classification for balanced datasets.
arXiv Detail & Related papers (2022-03-17T04:34:28Z) - On the Optimization Landscape of Neural Collapse under MSE Loss: Global
Optimality with Unconstrained Features [38.05002597295796]
Collapselayers collapse to the vertices of a Simplex Equiangular Tight Frame (ETF)
An intriguing empirical phenomenon has been widely observed in the last-layers and features of deep neural networks for tasks.
arXiv Detail & Related papers (2022-03-02T17:00:18Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.