Prevalence of Neural Collapse during the terminal phase of deep learning
training
- URL: http://arxiv.org/abs/2008.08186v2
- Date: Fri, 21 Aug 2020 16:15:50 GMT
- Title: Prevalence of Neural Collapse during the terminal phase of deep learning
training
- Authors: Vardan Papyan, X.Y. Han, David L. Donoho
- Abstract summary: Modern practice for training classification deepnets involves a Terminal Phase of Training (TPT)
During TPT, the training error stays effectively zero while training loss is pushed towards zero.
The symmetric and very simple geometry induced by the TPT confers important benefits, including better performance, better generalization, and better interpretability.
- Score: 7.031848258307718
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern practice for training classification deepnets involves a Terminal
Phase of Training (TPT), which begins at the epoch where training error first
vanishes; During TPT, the training error stays effectively zero while training
loss is pushed towards zero. Direct measurements of TPT, for three prototypical
deepnet architectures and across seven canonical classification datasets,
expose a pervasive inductive bias we call Neural Collapse, involving four
deeply interconnected phenomena: (NC1) Cross-example within-class variability
of last-layer training activations collapses to zero, as the individual
activations themselves collapse to their class-means; (NC2) The class-means
collapse to the vertices of a Simplex Equiangular Tight Frame (ETF); (NC3) Up
to rescaling, the last-layer classifiers collapse to the class-means, or in
other words to the Simplex ETF, i.e. to a self-dual configuration; (NC4) For a
given activation, the classifier's decision collapses to simply choosing
whichever class has the closest train class-mean, i.e. the Nearest Class Center
(NCC) decision rule. The symmetric and very simple geometry induced by the TPT
confers important benefits, including better generalization performance, better
robustness, and better interpretability.
Related papers
- Class-Imbalanced Semi-Supervised Learning for Large-Scale Point Cloud
Semantic Segmentation via Decoupling Optimization [64.36097398869774]
Semi-supervised learning (SSL) has been an active research topic for large-scale 3D scene understanding.
The existing SSL-based methods suffer from severe training bias due to class imbalance and long-tail distributions of the point cloud data.
We introduce a new decoupling optimization framework, which disentangles feature representation learning and classifier in an alternative optimization manner to shift the bias decision boundary effectively.
arXiv Detail & Related papers (2024-01-13T04:16:40Z) - Neural Collapse for Cross-entropy Class-Imbalanced Learning with Unconstrained ReLU Feature Model [25.61363481391964]
We show that when the training dataset is class-imbalanced, some Neural Collapse (NC) properties will no longer be true.
In this paper, we generalize NC to imbalanced regime for cross-entropy loss under the unconstrained ReLU feature model.
We find that the weights are aligned to the scaled and centered class-means with scaling factors depend on the number of training samples of each class.
arXiv Detail & Related papers (2024-01-04T04:53:31Z) - Inducing Neural Collapse to a Fixed Hierarchy-Aware Frame for Reducing
Mistake Severity [0.0]
We propose to fix the linear classifier of a deep neural network to a Hierarchy-Aware Frame (HAFrame)
We demonstrate that our approach reduces the mistake severity of the model's predictions while maintaining its top-1 accuracy on several datasets.
arXiv Detail & Related papers (2023-03-10T03:44:01Z) - Neural Collapse Inspired Feature-Classifier Alignment for Few-Shot Class
Incremental Learning [120.53458753007851]
Few-shot class-incremental learning (FSCIL) has been a challenging problem as only a few training samples are accessible for each novel class in the new sessions.
We deal with this misalignment dilemma in FSCIL inspired by the recently discovered phenomenon named neural collapse.
We propose a neural collapse inspired framework for FSCIL. Experiments on the miniImageNet, CUB-200, and CIFAR-100 datasets demonstrate that our proposed framework outperforms the state-of-the-art performances.
arXiv Detail & Related papers (2023-02-06T18:39:40Z) - Understanding Imbalanced Semantic Segmentation Through Neural Collapse [81.89121711426951]
We show that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes.
We introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure.
Our method ranks 1st and sets a new record on the ScanNet200 test leaderboard.
arXiv Detail & Related papers (2023-01-03T13:51:51Z) - Killing Two Birds with One Stone:Efficient and Robust Training of Face
Recognition CNNs by Partial FC [66.71660672526349]
We propose a sparsely updating variant of the Fully Connected (FC) layer, named Partial FC (PFC)
In each iteration, positive class centers and a random subset of negative class centers are selected to compute the margin-based softmax loss.
The computing requirement, the probability of inter-class conflict, and the frequency of passive update on tail class centers, are dramatically reduced.
arXiv Detail & Related papers (2022-03-28T14:33:21Z) - Do We Really Need a Learnable Classifier at the End of Deep Neural
Network? [118.18554882199676]
We study the potential of learning a neural network for classification with the classifier randomly as an ETF and fixed during training.
Our experimental results show that our method is able to achieve similar performances on image classification for balanced datasets.
arXiv Detail & Related papers (2022-03-17T04:34:28Z) - Neural Collapse Under MSE Loss: Proximity to and Dynamics on the Central
Path [11.181590224799224]
Recent work discovered a phenomenon called Neural Collapse (NC) that occurs pervasively in today's deep net training paradigm.
In this work, we establish the empirical reality of MSE-NC by reporting experimental observations for three prototypical networks and five canonical datasets.
We produce closed-form dynamics that predict full Neural Collapse in an unconstrained features model.
arXiv Detail & Related papers (2021-06-03T18:31:41Z) - Feature Purification: How Adversarial Training Performs Robust Deep
Learning [66.05472746340142]
We show a principle that we call Feature Purification, where we show one of the causes of the existence of adversarial examples is the accumulation of certain small dense mixtures in the hidden weights during the training process of a neural network.
We present both experiments on the CIFAR-10 dataset to illustrate this principle, and a theoretical result proving that for certain natural classification tasks, training a two-layer neural network with ReLU activation using randomly gradient descent indeed this principle.
arXiv Detail & Related papers (2020-05-20T16:56:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.