The Impact of Geometric Complexity on Neural Collapse in Transfer Learning
- URL: http://arxiv.org/abs/2405.15706v2
- Date: Tue, 28 May 2024 14:17:51 GMT
- Title: The Impact of Geometric Complexity on Neural Collapse in Transfer Learning
- Authors: Michael Munn, Benoit Dherin, Javier Gonzalvo,
- Abstract summary: Flatness of the loss surface and neural collapse have recently emerged as useful pre-training metrics.
We show through experiments and theory that mechanisms which affect the geometric complexity of the pre-trained network also influence the neural collapse.
- Score: 6.554326244334867
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many of the recent remarkable advances in computer vision and language models can be attributed to the success of transfer learning via the pre-training of large foundation models. However, a theoretical framework which explains this empirical success is incomplete and remains an active area of research. Flatness of the loss surface and neural collapse have recently emerged as useful pre-training metrics which shed light on the implicit biases underlying pre-training. In this paper, we explore the geometric complexity of a model's learned representations as a fundamental mechanism that relates these two concepts. We show through experiments and theory that mechanisms which affect the geometric complexity of the pre-trained network also influence the neural collapse. Furthermore, we show how this effect of the geometric complexity generalizes to the neural collapse of new classes as well, thus encouraging better performance on downstream tasks, particularly in the few-shot setting.
Related papers
- Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network.
Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z) - Navigate Beyond Shortcuts: Debiased Learning through the Lens of Neural Collapse [19.279084204631204]
We extend the investigation of Neural Collapse to the biased datasets with imbalanced attributes.
We propose an avoid-shortcut learning framework without additional training complexity.
With well-designed shortcut primes based on Neural Collapse structure, the models are encouraged to skip the pursuit of simple shortcuts.
arXiv Detail & Related papers (2024-05-09T07:23:37Z) - A singular Riemannian Geometry Approach to Deep Neural Networks III. Piecewise Differentiable Layers and Random Walks on $n$-dimensional Classes [49.32130498861987]
We study the case of non-differentiable activation functions, such as ReLU.
Two recent works introduced a geometric framework to study neural networks.
We illustrate our findings with some numerical experiments on classification of images and thermodynamic problems.
arXiv Detail & Related papers (2024-04-09T08:11:46Z) - Disentangling the Causes of Plasticity Loss in Neural Networks [55.23250269007988]
We show that loss of plasticity can be decomposed into multiple independent mechanisms.
We show that a combination of layer normalization and weight decay is highly effective at maintaining plasticity in a variety of synthetic nonstationary learning tasks.
arXiv Detail & Related papers (2024-02-29T00:02:33Z) - Generalized Neural Collapse for a Large Number of Classes [33.46269920297418]
We provide empirical study to verify the occurrence of generalized neural collapse in practical deep neural networks.
We provide theoretical study to show that the generalized neural collapse provably occurs under unconstrained feature model with spherical constraint.
arXiv Detail & Related papers (2023-10-09T02:27:04Z) - An Analytic Framework for Robust Training of Artificial Neural Networks [5.7365885616661405]
It is difficult to describe the phenomenon due to the complicated nature of the problems in machine learning.
This paper make use of complex analysis and holomorphicity to offer a robust learning rule for artificial neural networks.
arXiv Detail & Related papers (2022-05-26T17:16:39Z) - Formalizing Generalization and Robustness of Neural Networks to Weight
Perturbations [58.731070632586594]
We provide the first formal analysis for feed-forward neural networks with non-negative monotone activation functions against weight perturbations.
We also design a new theory-driven loss function for training generalizable and robust neural networks against weight perturbations.
arXiv Detail & Related papers (2021-03-03T06:17:03Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z) - Developing Constrained Neural Units Over Time [81.19349325749037]
This paper focuses on an alternative way of defining Neural Networks, that is different from the majority of existing approaches.
The structure of the neural architecture is defined by means of a special class of constraints that are extended also to the interaction with data.
The proposed theory is cast into the time domain, in which data are presented to the network in an ordered manner.
arXiv Detail & Related papers (2020-09-01T09:07:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.