Related papers: Neural Collapse for Cross-entropy Class-Imbalanced Learning with Unconstrained ReLU Feature Model

Neural Collapse for Cross-entropy Class-Imbalanced Learning with Unconstrained ReLU Feature Model

URL: http://arxiv.org/abs/2401.02058v2
Date: Thu, 6 Jun 2024 09:36:49 GMT
Title: Neural Collapse for Cross-entropy Class-Imbalanced Learning with Unconstrained ReLU Feature Model
Authors: Hien Dang, Tho Tran, Tan Nguyen, Nhat Ho,
Abstract summary: We show that when the training dataset is class-imbalanced, some Neural Collapse (NC) properties will no longer be true. In this paper, we generalize NC to imbalanced regime for cross-entropy loss under the unconstrained ReLU feature model. We find that the weights are aligned to the scaled and centered class-means with scaling factors depend on the number of training samples of each class.
Score: 25.61363481391964
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The current paradigm of training deep neural networks for classification tasks includes minimizing the empirical risk that pushes the training loss value towards zero, even after the training error has been vanished. In this terminal phase of training, it has been observed that the last-layer features collapse to their class-means and these class-means converge to the vertices of a simplex Equiangular Tight Frame (ETF). This phenomenon is termed as Neural Collapse (NC). To theoretically understand this phenomenon, recent works employ a simplified unconstrained feature model to prove that NC emerges at the global solutions of the training problem. However, when the training dataset is class-imbalanced, some NC properties will no longer be true. For example, the class-means geometry will skew away from the simplex ETF when the loss converges. In this paper, we generalize NC to imbalanced regime for cross-entropy loss under the unconstrained ReLU feature model. We prove that, while the within-class features collapse property still holds in this setting, the class-means will converge to a structure consisting of orthogonal vectors with different lengths. Furthermore, we find that the classifier weights are aligned to the scaled and centered class-means with scaling factors depend on the number of training samples of each class, which generalizes NC in the class-balanced setting. We empirically prove our results through experiments on practical architectures and dataset.

Related papers

Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks [59.552873049024775]
We show that compute-optimally trained models exhibit a remarkably precise universality.<n>With learning rate decay, the collapse becomes so tight that differences in the normalized curves across models fall below the noise floor.<n>We explain these phenomena by connecting collapse to the power-law structure in typical neural scaling laws.
arXiv Detail & Related papers (2025-07-02T20:03:34Z)
Neural Collapse for Unconstrained Feature Model under Cross-entropy Loss with Imbalanced Data [1.0152838128195467]
We study the extension of Neural Collapse (N C) phenomenon to imbalanced data under cross-entropy loss function. Our contribution is multi-fold compared with the state-of-the-art results.
arXiv Detail & Related papers (2023-09-18T12:45:08Z)
Neural Collapse Inspired Feature-Classifier Alignment for Few-Shot Class Incremental Learning [120.53458753007851]
Few-shot class-incremental learning (FSCIL) has been a challenging problem as only a few training samples are accessible for each novel class in the new sessions. We deal with this misalignment dilemma in FSCIL inspired by the recently discovered phenomenon named neural collapse. We propose a neural collapse inspired framework for FSCIL. Experiments on the miniImageNet, CUB-200, and CIFAR-100 datasets demonstrate that our proposed framework outperforms the state-of-the-art performances.
arXiv Detail & Related papers (2023-02-06T18:39:40Z)
Neural Collapse in Deep Linear Networks: From Balanced to Imbalanced Data [12.225207401994737]
We show that complex systems with massive amounts of parameters exhibit the same structural properties when training until convergence. In particular, it has been observed that the last-layer features collapse to their class-means. Our results demonstrate the convergence of the last-layer features and classifiers to a geometry consisting of vectors.
arXiv Detail & Related papers (2023-01-01T16:29:56Z)
Neural Collapse Inspired Attraction-Repulsion-Balanced Loss for Imbalanced Learning [97.81549071978789]
We propose Attraction-Repulsion-Balanced Loss (ARB-Loss) to balance the different components of the gradients. We perform experiments on the large-scale classification and segmentation datasets and our ARB-Loss can achieve state-of-the-art performance.
arXiv Detail & Related papers (2022-04-19T08:23:23Z)
Do We Really Need a Learnable Classifier at the End of Deep Neural Network? [118.18554882199676]
We study the potential of learning a neural network for classification with the classifier randomly as an ETF and fixed during training. Our experimental results show that our method is able to achieve similar performances on image classification for balanced datasets.
arXiv Detail & Related papers (2022-03-17T04:34:28Z)
Extended Unconstrained Features Model for Exploring Deep Neural Collapse [59.59039125375527]
Recently, a phenomenon termed "neural collapse" (NC) has been empirically observed in deep neural networks. Recent papers have shown that minimizers with this structure emerge when optimizing a simplified "unconstrained features model" In this paper, we study the UFM for the regularized MSE loss, and show that the minimizers' features can be more structured than in the cross-entropy case.
arXiv Detail & Related papers (2022-02-16T14:17:37Z)
Benign Overfitting in Two-layer Convolutional Neural Networks [90.75603889605043]
We study the benign overfitting phenomenon in training a two-layer convolutional neural network (CNN) We show that when the signal-to-noise ratio satisfies a certain condition, a two-layer CNN trained by gradient descent can achieve arbitrarily small training and test loss. On the other hand, when this condition does not hold, overfitting becomes harmful and the obtained CNN can only achieve constant level test loss.
arXiv Detail & Related papers (2022-02-14T07:45:51Z)
Prevalence of Neural Collapse during the terminal phase of deep learning training [7.031848258307718]
Modern practice for training classification deepnets involves a Terminal Phase of Training (TPT) During TPT, the training error stays effectively zero while training loss is pushed towards zero. The symmetric and very simple geometry induced by the TPT confers important benefits, including better performance, better generalization, and better interpretability.
arXiv Detail & Related papers (2020-08-18T23:12:54Z)
Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning [59.65752299209042]
We investigate learning a ConvNet under such a scenario. We found that a ConvNet significantly over-fits the minor classes. We propose to incorporate class-dependent temperatures (CDT) training ConvNet.
arXiv Detail & Related papers (2020-01-06T03:52:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.