Neural Collapse in Deep Linear Networks: From Balanced to Imbalanced
Data
- URL: http://arxiv.org/abs/2301.00437v5
- Date: Sun, 18 Jun 2023 07:55:53 GMT
- Title: Neural Collapse in Deep Linear Networks: From Balanced to Imbalanced
Data
- Authors: Hien Dang and Tho Tran and Stanley Osher and Hung Tran-The and Nhat Ho
and Tan Nguyen
- Abstract summary: We show that complex systems with massive amounts of parameters exhibit the same structural properties when training until convergence.
In particular, it has been observed that the last-layer features collapse to their class-means.
Our results demonstrate the convergence of the last-layer features and classifiers to a geometry consisting of vectors.
- Score: 12.225207401994737
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern deep neural networks have achieved impressive performance on tasks
from image classification to natural language processing. Surprisingly, these
complex systems with massive amounts of parameters exhibit the same structural
properties in their last-layer features and classifiers across canonical
datasets when training until convergence. In particular, it has been observed
that the last-layer features collapse to their class-means, and those
class-means are the vertices of a simplex Equiangular Tight Frame (ETF). This
phenomenon is known as Neural Collapse (NC). Recent papers have theoretically
shown that NC emerges in the global minimizers of training problems with the
simplified "unconstrained feature model". In this context, we take a step
further and prove the NC occurrences in deep linear networks for the popular
mean squared error (MSE) and cross entropy (CE) losses, showing that global
solutions exhibit NC properties across the linear layers. Furthermore, we
extend our study to imbalanced data for MSE loss and present the first
geometric analysis of NC under bias-free setting. Our results demonstrate the
convergence of the last-layer features and classifiers to a geometry consisting
of orthogonal vectors, whose lengths depend on the amount of data in their
corresponding classes. Finally, we empirically validate our theoretical
analyses on synthetic and practical network architectures with both balanced
and imbalanced scenarios.
Related papers
- Supervised Contrastive Representation Learning: Landscape Analysis with
Unconstrained Features [33.703796571991745]
Recent findings reveal that overparameterized deep neural networks, trained beyond zero training, exhibit a distinctive structural pattern at the final layer.
These results indicate that the final-layer outputs in such networks display minimal within-class variations.
arXiv Detail & Related papers (2024-02-29T06:02:45Z) - Neural Collapse for Cross-entropy Class-Imbalanced Learning with Unconstrained ReLU Feature Model [25.61363481391964]
We show that when the training dataset is class-imbalanced, some Neural Collapse (NC) properties will no longer be true.
In this paper, we generalize NC to imbalanced regime for cross-entropy loss under the unconstrained ReLU feature model.
We find that the weights are aligned to the scaled and centered class-means with scaling factors depend on the number of training samples of each class.
arXiv Detail & Related papers (2024-01-04T04:53:31Z) - Neural Collapse for Unconstrained Feature Model under Cross-entropy Loss
with Imbalanced Data [1.0152838128195467]
We study the extension of Neural Collapse (N C) phenomenon to imbalanced data under cross-entropy loss function.
Our contribution is multi-fold compared with the state-of-the-art results.
arXiv Detail & Related papers (2023-09-18T12:45:08Z) - Do We Really Need a Learnable Classifier at the End of Deep Neural
Network? [118.18554882199676]
We study the potential of learning a neural network for classification with the classifier randomly as an ETF and fixed during training.
Our experimental results show that our method is able to achieve similar performances on image classification for balanced datasets.
arXiv Detail & Related papers (2022-03-17T04:34:28Z) - On the Optimization Landscape of Neural Collapse under MSE Loss: Global
Optimality with Unconstrained Features [38.05002597295796]
Collapselayers collapse to the vertices of a Simplex Equiangular Tight Frame (ETF)
An intriguing empirical phenomenon has been widely observed in the last-layers and features of deep neural networks for tasks.
arXiv Detail & Related papers (2022-03-02T17:00:18Z) - Extended Unconstrained Features Model for Exploring Deep Neural Collapse [59.59039125375527]
Recently, a phenomenon termed "neural collapse" (NC) has been empirically observed in deep neural networks.
Recent papers have shown that minimizers with this structure emerge when optimizing a simplified "unconstrained features model"
In this paper, we study the UFM for the regularized MSE loss, and show that the minimizers' features can be more structured than in the cross-entropy case.
arXiv Detail & Related papers (2022-02-16T14:17:37Z) - Neural Collapse Under MSE Loss: Proximity to and Dynamics on the Central
Path [11.181590224799224]
Recent work discovered a phenomenon called Neural Collapse (NC) that occurs pervasively in today's deep net training paradigm.
In this work, we establish the empirical reality of MSE-NC by reporting experimental observations for three prototypical networks and five canonical datasets.
We produce closed-form dynamics that predict full Neural Collapse in an unconstrained features model.
arXiv Detail & Related papers (2021-06-03T18:31:41Z) - Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow.
We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Revealing the Structure of Deep Neural Networks via Convex Duality [70.15611146583068]
We study regularized deep neural networks (DNNs) and introduce a convex analytic framework to characterize the structure of hidden layers.
We show that a set of optimal hidden layer weights for a norm regularized training problem can be explicitly found as the extreme points of a convex set.
We apply the same characterization to deep ReLU networks with whitened data and prove the same weight alignment holds.
arXiv Detail & Related papers (2020-02-22T21:13:44Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.