Neural Rank Collapse: Weight Decay and Small Within-Class Variability
Yield Low-Rank Bias
- URL: http://arxiv.org/abs/2402.03991v1
- Date: Tue, 6 Feb 2024 13:44:39 GMT
- Title: Neural Rank Collapse: Weight Decay and Small Within-Class Variability
Yield Low-Rank Bias
- Authors: Emanuele Zangrando, Piero Deidda, Simone Brugiapaglia, Nicola
Guglielmi, Francesco Tudisco
- Abstract summary: We show the presence of an intriguing neural rank collapse phenomenon, connecting the low-rank bias of trained networks with networks' neural collapse properties.
As the weight decay parameter grows, the rank of each layer in the network decreases proportionally to the within-class variability of the hidden-space embeddings of the previous layers.
- Score: 4.829265670567825
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work in deep learning has shown strong empirical and theoretical
evidence of an implicit low-rank bias: weight matrices in deep networks tend to
be approximately low-rank and removing relatively small singular values during
training or from available trained models may significantly reduce model size
while maintaining or even improving model performance. However, the majority of
the theoretical investigations around low-rank bias in neural networks deal
with oversimplified deep linear networks. In this work, we consider general
networks with nonlinear activations and the weight decay parameter, and we show
the presence of an intriguing neural rank collapse phenomenon, connecting the
low-rank bias of trained networks with networks' neural collapse properties: as
the weight decay parameter grows, the rank of each layer in the network
decreases proportionally to the within-class variability of the hidden-space
embeddings of the previous layers. Our theoretical findings are supported by a
range of experimental evaluations illustrating the phenomenon.
Related papers
- Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence.
We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers.
This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z) - Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers.
It is an instance of a key structural condition that applies across broad domains of machine learning.
For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z) - SGD and Weight Decay Secretly Minimize the Rank of Your Neural Network [8.79431718760617]
Training with mini-batch SGD and weight decay induces a bias toward rank minimization in weight matrices.
We show that this bias becomes more pronounced with smaller batch sizes, higher learning rates, or stronger weight decay.
We empirically explore the connection between this bias and generalization, finding that it has a marginal effect on the test performance.
arXiv Detail & Related papers (2022-06-12T17:06:35Z) - With Greater Distance Comes Worse Performance: On the Perspective of
Layer Utilization and Model Generalization [3.6321778403619285]
Generalization of deep neural networks remains one of the main open problems in machine learning.
Early layers generally learn representations relevant to performance on both training data and testing data.
Deeper layers only minimize training risks and fail to generalize well with testing or mislabeled data.
arXiv Detail & Related papers (2022-01-28T05:26:32Z) - Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
We study the phenomenon of "double descent" of the generalization error.
We find that double descent can be attributed to distinct features being learned at different scales.
arXiv Detail & Related papers (2021-12-06T18:17:08Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - Gradient-trained Weights in Wide Neural Networks Align Layerwise to
Error-scaled Input Correlations [11.176824373696324]
We derive the layerwise weight dynamics of infinite-width neural networks with nonlinear activations trained by gradient descent.
We formulate backpropagation-free learning rules, named Align-zero and Align-ada, that theoretically achieve the same alignment as backpropagation.
arXiv Detail & Related papers (2021-06-15T21:56:38Z) - Formalizing Generalization and Robustness of Neural Networks to Weight
Perturbations [58.731070632586594]
We provide the first formal analysis for feed-forward neural networks with non-negative monotone activation functions against weight perturbations.
We also design a new theory-driven loss function for training generalizable and robust neural networks against weight perturbations.
arXiv Detail & Related papers (2021-03-03T06:17:03Z) - Exploring Deep Neural Networks via Layer-Peeled Model: Minority Collapse
in Imbalanced Training [39.137793683411424]
We introduce the textitLayer-Peeled Model, a non-yet analytically tractable optimization program.
We show that the model inherits many characteristics of well-trained networks, thereby offering an effective tool for explaining and predicting common empirical patterns of deep learning training.
In particular, we show that the model reveals a hitherto unknown phenomenon that we term textitMinority Collapse, which fundamentally limits the performance of deep learning models on the minority classes.
arXiv Detail & Related papers (2021-01-29T17:37:17Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.