Neural Collapse for Unconstrained Feature Model under Cross-entropy Loss
with Imbalanced Data
- URL: http://arxiv.org/abs/2309.09725v2
- Date: Tue, 24 Oct 2023 18:21:58 GMT
- Title: Neural Collapse for Unconstrained Feature Model under Cross-entropy Loss
with Imbalanced Data
- Authors: Wanli Hong and Shuyang Ling
- Abstract summary: We study the extension of Neural Collapse (N C) phenomenon to imbalanced data under cross-entropy loss function.
Our contribution is multi-fold compared with the state-of-the-art results.
- Score: 1.0152838128195467
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent years have witnessed the huge success of deep neural networks (DNNs)
in various tasks of computer vision and text processing. Interestingly, these
DNNs with massive number of parameters share similar structural properties on
their feature representation and last-layer classifier at terminal phase of
training (TPT). Specifically, if the training data are balanced (each class
shares the same number of samples), it is observed that the feature vectors of
samples from the same class converge to their corresponding in-class mean
features and their pairwise angles are the same. This fascinating phenomenon is
known as Neural Collapse (N C), first termed by Papyan, Han, and Donoho in
2019. Many recent works manage to theoretically explain this phenomenon by
adopting so-called unconstrained feature model (UFM). In this paper, we study
the extension of N C phenomenon to the imbalanced data under cross-entropy loss
function in the context of unconstrained feature model. Our contribution is
multi-fold compared with the state-of-the-art results: (a) we show that the
feature vectors exhibit collapse phenomenon, i.e., the features within the same
class collapse to the same mean vector; (b) the mean feature vectors no longer
form an equiangular tight frame. Instead, their pairwise angles depend on the
sample size; (c) we also precisely characterize the sharp threshold on which
the minority collapse (the feature vectors of the minority groups collapse to
one single vector) will take place; (d) finally, we argue that the effect of
the imbalance in datasize diminishes as the sample size grows. Our results
provide a complete picture of the N C under the cross-entropy loss for the
imbalanced data. Numerical experiments confirm our theoretical analysis.
Related papers
- Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse [32.06666853127924]
Deep neural networks (DNNs) at convergence consistently represent the training data in the last layer via a symmetric geometric structure referred to as neural collapse.
Here, the features of the penultimate layer are free variables, which makes the model data-agnostic and, hence, puts into question its ability to capture training.
We first prove generic guarantees on neural collapse that assume (i) low training error and balancedness of the linear layers, and (ii) bounded conditioning of the features before the linear part.
arXiv Detail & Related papers (2024-10-07T10:16:40Z) - The Prevalence of Neural Collapse in Neural Multivariate Regression [3.691119072844077]
We show that neural networks exhibit Neural Collapse (NC) during the final stage of training for the classification problem.
To our knowledge, this is the first empirical and theoretical study of neural collapse in the context of regression.
arXiv Detail & Related papers (2024-09-06T10:45:58Z) - Neural Collapse for Cross-entropy Class-Imbalanced Learning with Unconstrained ReLU Feature Model [25.61363481391964]
We show that when the training dataset is class-imbalanced, some Neural Collapse (NC) properties will no longer be true.
In this paper, we generalize NC to imbalanced regime for cross-entropy loss under the unconstrained ReLU feature model.
We find that the weights are aligned to the scaled and centered class-means with scaling factors depend on the number of training samples of each class.
arXiv Detail & Related papers (2024-01-04T04:53:31Z) - Neural Collapse in Deep Linear Networks: From Balanced to Imbalanced
Data [12.225207401994737]
We show that complex systems with massive amounts of parameters exhibit the same structural properties when training until convergence.
In particular, it has been observed that the last-layer features collapse to their class-means.
Our results demonstrate the convergence of the last-layer features and classifiers to a geometry consisting of vectors.
arXiv Detail & Related papers (2023-01-01T16:29:56Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - Imbalance Trouble: Revisiting Neural-Collapse Geometry [27.21274327569783]
We introduce Simplex-Encoded-Labels Interpolation (SELI) as an invariant characterization of the neural collapse phenomenon.
We prove for the UFM with cross-entropy loss and vanishing regularization.
We present experiments on synthetic and real datasets that confirm convergence to the SELI geometry.
arXiv Detail & Related papers (2022-08-10T18:10:59Z) - Curvature-informed multi-task learning for graph networks [56.155331323304]
State-of-the-art graph neural networks attempt to predict multiple properties simultaneously.
We investigate a potential explanation for this phenomenon: the curvature of each property's loss surface significantly varies, leading to inefficient learning.
arXiv Detail & Related papers (2022-08-02T18:18:41Z) - Do We Really Need a Learnable Classifier at the End of Deep Neural
Network? [118.18554882199676]
We study the potential of learning a neural network for classification with the classifier randomly as an ETF and fixed during training.
Our experimental results show that our method is able to achieve similar performances on image classification for balanced datasets.
arXiv Detail & Related papers (2022-03-17T04:34:28Z) - Extended Unconstrained Features Model for Exploring Deep Neural Collapse [59.59039125375527]
Recently, a phenomenon termed "neural collapse" (NC) has been empirically observed in deep neural networks.
Recent papers have shown that minimizers with this structure emerge when optimizing a simplified "unconstrained features model"
In this paper, we study the UFM for the regularized MSE loss, and show that the minimizers' features can be more structured than in the cross-entropy case.
arXiv Detail & Related papers (2022-02-16T14:17:37Z) - Redundant representations help generalization in wide neural networks [71.38860635025907]
We study the last hidden layer representations of various state-of-the-art convolutional neural networks.
We find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise.
arXiv Detail & Related papers (2021-06-07T10:18:54Z) - Semiparametric Nonlinear Bipartite Graph Representation Learning with
Provable Guarantees [106.91654068632882]
We consider the bipartite graph and formalize its representation learning problem as a statistical estimation problem of parameters in a semiparametric exponential family distribution.
We show that the proposed objective is strongly convex in a neighborhood around the ground truth, so that a gradient descent-based method achieves linear convergence rate.
Our estimator is robust to any model misspecification within the exponential family, which is validated in extensive experiments.
arXiv Detail & Related papers (2020-03-02T16:40:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.