Related papers: Extended Unconstrained Features Model for Exploring Deep Neural Collapse

Extended Unconstrained Features Model for Exploring Deep Neural Collapse

URL: http://arxiv.org/abs/2202.08087v1
Date: Wed, 16 Feb 2022 14:17:37 GMT
Title: Extended Unconstrained Features Model for Exploring Deep Neural Collapse
Authors: Tom Tirer, Joan Bruna
Abstract summary: Recently, a phenomenon termed "neural collapse" (NC) has been empirically observed in deep neural networks. Recent papers have shown that minimizers with this structure emerge when optimizing a simplified "unconstrained features model" In this paper, we study the UFM for the regularized MSE loss, and show that the minimizers' features can be more structured than in the cross-entropy case.
Score: 59.59039125375527
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The modern strategy for training deep neural networks for classification tasks includes optimizing the network's weights even after the training error vanishes to further push the training loss toward zero. Recently, a phenomenon termed "neural collapse" (NC) has been empirically observed in this training procedure. Specifically, it has been shown that the learned features (the output of the penultimate layer) of within-class samples converge to their mean, and the means of different classes exhibit a certain tight frame structure, which is also aligned with the last layer's weights. Recent papers have shown that minimizers with this structure emerge when optimizing a simplified "unconstrained features model" (UFM) with a regularized cross-entropy loss. In this paper, we further analyze and extend the UFM. First, we study the UFM for the regularized MSE loss, and show that the minimizers' features can be more structured than in the cross-entropy case. This affects also the structure of the weights. Then, we extend the UFM by adding another layer of weights as well as ReLU nonlinearity to the model and generalize our previous results. Finally, we empirically demonstrate the usefulness of our nonlinear extended UFM in modeling the NC phenomenon that occurs with practical networks.

Related papers

An Overview of Low-Rank Structures in the Training and Adaptation of Large Models [52.67110072923365]
Recent research has uncovered a widespread phenomenon in deep networks: the emergence of low-rank structures. These implicit low-dimensional patterns provide valuable insights for improving the efficiency of training and fine-tuning large-scale models. We present a comprehensive review of advances in exploiting low-rank structures for deep learning and shed light on their mathematical foundations.
arXiv Detail & Related papers (2025-03-25T17:26:09Z)
Structure Is Not Enough: Leveraging Behavior for Neural Network Weight Reconstruction [6.926413609535758]
One approach to leverage NN weights involves training autoencoders (AEs) using contrastive and reconstruction losses. AEs reconstruct NN models with deteriorated performance compared to the original ones, limiting their usability with regard to model weight generation. We show a strong synergy between structural and behavioral signals, leading to increased performance in all downstream tasks evaluated.
arXiv Detail & Related papers (2025-03-21T13:39:04Z)
Supervised Contrastive Representation Learning: Landscape Analysis with Unconstrained Features [33.703796571991745]
Recent findings reveal that overparameterized deep neural networks, trained beyond zero training, exhibit a distinctive structural pattern at the final layer. These results indicate that the final-layer outputs in such networks display minimal within-class variations.
arXiv Detail & Related papers (2024-02-29T06:02:45Z)
On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics. The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z)
Towards Demystifying the Generalization Behaviors When Neural Collapse Emerges [132.62934175555145]
Neural Collapse (NC) is a well-known phenomenon of deep neural networks in the terminal phase of training (TPT) We propose a theoretical explanation for why continuing training can still lead to accuracy improvement on test set, even after the train accuracy has reached 100%. We refer to this newly discovered property as "non-conservative generalization"
arXiv Detail & Related papers (2023-10-12T14:29:02Z)
Neural Collapse in Deep Linear Networks: From Balanced to Imbalanced Data [12.225207401994737]
We show that complex systems with massive amounts of parameters exhibit the same structural properties when training until convergence. In particular, it has been observed that the last-layer features collapse to their class-means. Our results demonstrate the convergence of the last-layer features and classifiers to a geometry consisting of vectors.
arXiv Detail & Related papers (2023-01-01T16:29:56Z)
Perturbation Analysis of Neural Collapse [24.94449183555951]
Training deep neural networks for classification often includes minimizing the training loss beyond the zero training error point. Recent works analyze this behavior via idealized unconstrained features models where all the minimizers exhibit exact collapse. We propose a richer model that can capture this phenomenon by forcing the features to stay in the vicinity of a predefined features matrix.
arXiv Detail & Related papers (2022-10-29T17:46:03Z)
Slimmable Networks for Contrastive Self-supervised Learning [69.9454691873866]
Self-supervised learning makes significant progress in pre-training large models, but struggles with small models. We introduce another one-stage solution to obtain pre-trained small models without the need for extra teachers. A slimmable network consists of a full network and several weight-sharing sub-networks, which can be pre-trained once to obtain various networks.
arXiv Detail & Related papers (2022-09-30T15:15:05Z)
An Unconstrained Layer-Peeled Perspective on Neural Collapse [20.75423143311858]
We introduce a surrogate model called the unconstrained layer-peeled model (ULPM) We prove that gradient flow on this model converges to critical points of a minimum-norm separation problem exhibiting neural collapse in its global minimizer. We show that our results also hold during the training of neural networks in real-world tasks when explicit regularization or weight decay is not used.
arXiv Detail & Related papers (2021-10-06T14:18:47Z)
Exploring Deep Neural Networks via Layer-Peeled Model: Minority Collapse in Imbalanced Training [39.137793683411424]
We introduce the textitLayer-Peeled Model, a non-yet analytically tractable optimization program. We show that the model inherits many characteristics of well-trained networks, thereby offering an effective tool for explaining and predicting common empirical patterns of deep learning training. In particular, we show that the model reveals a hitherto unknown phenomenon that we term textitMinority Collapse, which fundamentally limits the performance of deep learning models on the minority classes.
arXiv Detail & Related papers (2021-01-29T17:37:17Z)
Neural networks with late-phase weights [66.72777753269658]
We show that the solutions found by SGD can be further improved by ensembling a subset of the weights in late stages of learning. At the end of learning, we obtain back a single model by taking a spatial average in weight space.
arXiv Detail & Related papers (2020-07-25T13:23:37Z)
Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms. We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.