Exploring Deep Neural Networks via Layer-Peeled Model: Minority Collapse
in Imbalanced Training
- URL: http://arxiv.org/abs/2101.12699v1
- Date: Wed, 8 Sep 2021 18:33:51 GMT
- Title: Exploring Deep Neural Networks via Layer-Peeled Model: Minority Collapse
in Imbalanced Training
- Authors: Cong Fang, Hangfeng He, Qi Long, Weijie J. Su
- Abstract summary: We introduce the textitLayer-Peeled Model, a non-yet analytically tractable optimization program.
We show that the model inherits many characteristics of well-trained networks, thereby offering an effective tool for explaining and predicting common empirical patterns of deep learning training.
In particular, we show that the model reveals a hitherto unknown phenomenon that we term textitMinority Collapse, which fundamentally limits the performance of deep learning models on the minority classes.
- Score: 39.137793683411424
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we introduce the \textit{Layer-Peeled Model}, a nonconvex yet
analytically tractable optimization program, in a quest to better understand
deep neural networks that are trained for a sufficiently long time. As the name
suggests, this new model is derived by isolating the topmost layer from the
remainder of the neural network, followed by imposing certain constraints
separately on the two parts of the network. We demonstrate that the
Layer-Peeled Model, albeit simple, inherits many characteristics of
well-trained neural networks, thereby offering an effective tool for explaining
and predicting common empirical patterns of deep learning training. First, when
working on class-balanced datasets, we prove that any solution to this model
forms a simplex equiangular tight frame, which in part explains the recently
discovered phenomenon of neural collapse \cite{papyan2020prevalence}. More
importantly, when moving to the imbalanced case, our analysis of the
Layer-Peeled Model reveals a hitherto unknown phenomenon that we term
\textit{Minority Collapse}, which fundamentally limits the performance of deep
learning models on the minority classes. In addition, we use the Layer-Peeled
Model to gain insights into how to mitigate Minority Collapse. Interestingly,
this phenomenon is first predicted by the Layer-Peeled Model before being
confirmed by our computational experiments.
Related papers
- Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network.
Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z) - Strong Model Collapse [16.071600606637908]
We consider a supervised regression setting and establish the existance of a strong form of the model collapse phenomenon.
Our results show that even the smallest fraction of synthetic data can lead to model collapse.
We investigate whether increasing model size, an approach aligned with current trends in training large language models, exacerbates or mitigates model collapse.
arXiv Detail & Related papers (2024-10-07T08:54:23Z) - Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning.
Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z) - On the Role of Neural Collapse in Meta Learning Models for Few-shot
Learning [0.9729803206187322]
This study is the first to explore and understand the properties of neural collapse in meta learning frameworks for few-shot learning.
We perform studies on the Omniglot dataset in the few-shot setting and study the neural collapse phenomenon.
arXiv Detail & Related papers (2023-09-30T18:02:51Z) - Layer-wise Linear Mode Connectivity [52.6945036534469]
Averaging neural network parameters is an intuitive method for the knowledge of two independent models.
It is most prominently used in federated learning.
We analyse the performance of the models that result from averaging single, or groups.
arXiv Detail & Related papers (2023-07-13T09:39:10Z) - Perturbation Analysis of Neural Collapse [24.94449183555951]
Training deep neural networks for classification often includes minimizing the training loss beyond the zero training error point.
Recent works analyze this behavior via idealized unconstrained features models where all the minimizers exhibit exact collapse.
We propose a richer model that can capture this phenomenon by forcing the features to stay in the vicinity of a predefined features matrix.
arXiv Detail & Related papers (2022-10-29T17:46:03Z) - Extended Unconstrained Features Model for Exploring Deep Neural Collapse [59.59039125375527]
Recently, a phenomenon termed "neural collapse" (NC) has been empirically observed in deep neural networks.
Recent papers have shown that minimizers with this structure emerge when optimizing a simplified "unconstrained features model"
In this paper, we study the UFM for the regularized MSE loss, and show that the minimizers' features can be more structured than in the cross-entropy case.
arXiv Detail & Related papers (2022-02-16T14:17:37Z) - The Self-Simplifying Machine: Exploiting the Structure of Piecewise
Linear Neural Networks to Create Interpretable Models [0.0]
We introduce novel methodology toward simplification and increased interpretability of Piecewise Linear Neural Networks for classification tasks.
Our methods include the use of a trained, deep network to produce a well-performing, single-hidden-layer network without further training.
On these methods, we conduct preliminary studies of model performance, as well as a case study on Wells Fargo's Home Lending dataset.
arXiv Detail & Related papers (2020-12-02T16:02:14Z) - A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood.
We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks.
Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z) - An analytic theory of shallow networks dynamics for hinge loss
classification [14.323962459195771]
We study the training dynamics of a simple type of neural network: a single hidden layer trained to perform a classification task.
We specialize our theory to the prototypical case of a linearly separable dataset and a linear hinge loss.
This allow us to address in a simple setting several phenomena appearing in modern networks such as slowing down of training dynamics, crossover between rich and lazy learning, and overfitting.
arXiv Detail & Related papers (2020-06-19T16:25:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.