Are All Losses Created Equal: A Neural Collapse Perspective
- URL: http://arxiv.org/abs/2210.02192v1
- Date: Tue, 4 Oct 2022 00:36:45 GMT
- Title: Are All Losses Created Equal: A Neural Collapse Perspective
- Authors: Jinxin Zhou, Chong You, Xiao Li, Kangning Liu, Sheng Liu, Qing Qu,
Zhihui Zhu
- Abstract summary: Cross entropy (CE) is the most commonly used loss to train deep neural networks for classification tasks.
We show through global solution and landscape analyses that a broad family of loss functions including commonly used label smoothing (LS) and focal loss (FL) exhibits Neural Collapse.
- Score: 36.0354919583995
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While cross entropy (CE) is the most commonly used loss to train deep neural
networks for classification tasks, many alternative losses have been developed
to obtain better empirical performance. Among them, which one is the best to
use is still a mystery, because there seem to be multiple factors affecting the
answer, such as properties of the dataset, the choice of network architecture,
and so on. This paper studies the choice of loss function by examining the
last-layer features of deep networks, drawing inspiration from a recent line
work showing that the global optimal solution of CE and mean-square-error (MSE)
losses exhibits a Neural Collapse phenomenon. That is, for sufficiently large
networks trained until convergence, (i) all features of the same class collapse
to the corresponding class mean and (ii) the means associated with different
classes are in a configuration where their pairwise distances are all equal and
maximized. We extend such results and show through global solution and
landscape analyses that a broad family of loss functions including commonly
used label smoothing (LS) and focal loss (FL) exhibits Neural Collapse. Hence,
all relevant losses(i.e., CE, LS, FL, MSE) produce equivalent features on
training data. Based on the unconstrained feature model assumption, we provide
either the global landscape analysis for LS loss or the local landscape
analysis for FL loss and show that the (only!) global minimizers are neural
collapse solutions, while all other critical points are strict saddles whose
Hessian exhibit negative curvature directions either in the global scope for LS
loss or in the local scope for FL loss near the optimal solution. The
experiments further show that Neural Collapse features obtained from all
relevant losses lead to largely identical performance on test data as well,
provided that the network is sufficiently large and trained until convergence.
Related papers
- Supervised Contrastive Representation Learning: Landscape Analysis with
Unconstrained Features [33.703796571991745]
Recent findings reveal that overparameterized deep neural networks, trained beyond zero training, exhibit a distinctive structural pattern at the final layer.
These results indicate that the final-layer outputs in such networks display minimal within-class variations.
arXiv Detail & Related papers (2024-02-29T06:02:45Z) - Multi-stage feature decorrelation constraints for improving CNN
classification performance [14.09469656684143]
This article proposes a multi-stage feature decorrelation loss (MFD Loss) for CNN.
MFD Loss refines effective features and eliminates information redundancy by constraining the correlation of features at all stages.
Compared with the single Softmax Loss supervised learning, the experiments on several commonly used datasets on several typical CNNs prove that the classification performance of Softmax Loss+MFD Loss is significantly better.
arXiv Detail & Related papers (2023-08-24T16:00:01Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - On the Optimization Landscape of Neural Collapse under MSE Loss: Global
Optimality with Unconstrained Features [38.05002597295796]
Collapselayers collapse to the vertices of a Simplex Equiangular Tight Frame (ETF)
An intriguing empirical phenomenon has been widely observed in the last-layers and features of deep neural networks for tasks.
arXiv Detail & Related papers (2022-03-02T17:00:18Z) - Taxonomizing local versus global structure in neural network loss
landscapes [60.206524503782006]
We show that the best test accuracy is obtained when the loss landscape is globally well-connected.
We also show that globally poorly-connected landscapes can arise when models are small or when they are trained to lower quality data.
arXiv Detail & Related papers (2021-07-23T13:37:14Z) - A Geometric Analysis of Neural Collapse with Unconstrained Features [40.66585948844492]
We provide the first global optimization landscape analysis of $Neural;Collapse$.
This phenomenon arises in the last-layer classifiers and features of neural networks during the terminal phase of training.
arXiv Detail & Related papers (2021-05-06T00:00:50Z) - Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow.
We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z) - $\sigma^2$R Loss: a Weighted Loss by Multiplicative Factors using
Sigmoidal Functions [0.9569316316728905]
We introduce a new loss function called squared reduction loss ($sigma2$R loss), which is regulated by a sigmoid function to inflate/deflate the error per instance.
Our loss has clear intuition and geometric interpretation, we demonstrate by experiments the effectiveness of our proposal.
arXiv Detail & Related papers (2020-09-18T12:34:40Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - The Hidden Convex Optimization Landscape of Two-Layer ReLU Neural
Networks: an Exact Characterization of the Optimal Solutions [51.60996023961886]
We prove that finding all globally optimal two-layer ReLU neural networks can be performed by solving a convex optimization program with cone constraints.
Our analysis is novel, characterizes all optimal solutions, and does not leverage duality-based analysis which was recently used to lift neural network training into convex spaces.
arXiv Detail & Related papers (2020-06-10T15:38:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.