On the Optimization Landscape of Neural Collapse under MSE Loss: Global
Optimality with Unconstrained Features
- URL: http://arxiv.org/abs/2203.01238v1
- Date: Wed, 2 Mar 2022 17:00:18 GMT
- Title: On the Optimization Landscape of Neural Collapse under MSE Loss: Global
Optimality with Unconstrained Features
- Authors: Jinxin Zhou, Xiao Li, Tianyu Ding, Chong You, Qing Qu and Zhihui Zhu
- Abstract summary: Collapselayers collapse to the vertices of a Simplex Equiangular Tight Frame (ETF)
An intriguing empirical phenomenon has been widely observed in the last-layers and features of deep neural networks for tasks.
- Score: 38.05002597295796
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When training deep neural networks for classification tasks, an intriguing
empirical phenomenon has been widely observed in the last-layer classifiers and
features, where (i) the class means and the last-layer classifiers all collapse
to the vertices of a Simplex Equiangular Tight Frame (ETF) up to scaling, and
(ii) cross-example within-class variability of last-layer activations collapses
to zero. This phenomenon is called Neural Collapse (NC), which seems to take
place regardless of the choice of loss functions. In this work, we justify NC
under the mean squared error (MSE) loss, where recent empirical evidence shows
that it performs comparably or even better than the de-facto cross-entropy
loss. Under a simplified unconstrained feature model, we provide the first
global landscape analysis for vanilla nonconvex MSE loss and show that the
(only!) global minimizers are neural collapse solutions, while all other
critical points are strict saddles whose Hessian exhibit negative curvature
directions. Furthermore, we justify the usage of rescaled MSE loss by probing
the optimization landscape around the NC solutions, showing that the landscape
can be improved by tuning the rescaling hyperparameters. Finally, our
theoretical findings are experimentally verified on practical network
architectures.
Related papers
- Supervised Contrastive Representation Learning: Landscape Analysis with
Unconstrained Features [33.703796571991745]
Recent findings reveal that overparameterized deep neural networks, trained beyond zero training, exhibit a distinctive structural pattern at the final layer.
These results indicate that the final-layer outputs in such networks display minimal within-class variations.
arXiv Detail & Related papers (2024-02-29T06:02:45Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Are All Losses Created Equal: A Neural Collapse Perspective [36.0354919583995]
Cross entropy (CE) is the most commonly used loss to train deep neural networks for classification tasks.
We show through global solution and landscape analyses that a broad family of loss functions including commonly used label smoothing (LS) and focal loss (FL) exhibits Neural Collapse.
arXiv Detail & Related papers (2022-10-04T00:36:45Z) - Neural Collapse with Normalized Features: A Geometric Analysis over the
Riemannian Manifold [30.3185037354742]
When training over normalized deep networks for classification tasks, the learned features exhibit a so-called "neural collapse" phenomenon.
We show that better representations can be learned faster via feature normalization.
arXiv Detail & Related papers (2022-09-19T17:26:32Z) - Do We Really Need a Learnable Classifier at the End of Deep Neural
Network? [118.18554882199676]
We study the potential of learning a neural network for classification with the classifier randomly as an ETF and fixed during training.
Our experimental results show that our method is able to achieve similar performances on image classification for balanced datasets.
arXiv Detail & Related papers (2022-03-17T04:34:28Z) - Extended Unconstrained Features Model for Exploring Deep Neural Collapse [59.59039125375527]
Recently, a phenomenon termed "neural collapse" (NC) has been empirically observed in deep neural networks.
Recent papers have shown that minimizers with this structure emerge when optimizing a simplified "unconstrained features model"
In this paper, we study the UFM for the regularized MSE loss, and show that the minimizers' features can be more structured than in the cross-entropy case.
arXiv Detail & Related papers (2022-02-16T14:17:37Z) - An Unconstrained Layer-Peeled Perspective on Neural Collapse [20.75423143311858]
We introduce a surrogate model called the unconstrained layer-peeled model (ULPM)
We prove that gradient flow on this model converges to critical points of a minimum-norm separation problem exhibiting neural collapse in its global minimizer.
We show that our results also hold during the training of neural networks in real-world tasks when explicit regularization or weight decay is not used.
arXiv Detail & Related papers (2021-10-06T14:18:47Z) - A Geometric Analysis of Neural Collapse with Unconstrained Features [40.66585948844492]
We provide the first global optimization landscape analysis of $Neural;Collapse$.
This phenomenon arises in the last-layer classifiers and features of neural networks during the terminal phase of training.
arXiv Detail & Related papers (2021-05-06T00:00:50Z) - Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow.
We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.