Average gradient outer product as a mechanism for deep neural collapse
- URL: http://arxiv.org/abs/2402.13728v2
- Date: Thu, 23 May 2024 19:36:39 GMT
- Title: Average gradient outer product as a mechanism for deep neural collapse
- Authors: Daniel Beaglehole, Peter Súkeník, Marco Mondelli, Mikhail Belkin,
- Abstract summary: Deep Neural Collapse (DNC) refers to the surprisingly rigid structure of the data representations in the final layers of Deep Neural Networks (DNNs)
In this work, we introduce a data-dependent setting where DNC forms due to feature learning through the average gradient outer product (AGOP)
We demonstrate theoretically and empirically that DNC occurs in Deep Recursive Feature Machines as a consequence of the projection with the AGOP matrix computed at each layer.
- Score: 26.939895223897572
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep Neural Collapse (DNC) refers to the surprisingly rigid structure of the data representations in the final layers of Deep Neural Networks (DNNs). Though the phenomenon has been measured in a variety of settings, its emergence is typically explained via data-agnostic approaches, such as the unconstrained features model. In this work, we introduce a data-dependent setting where DNC forms due to feature learning through the average gradient outer product (AGOP). The AGOP is defined with respect to a learned predictor and is equal to the uncentered covariance matrix of its input-output gradients averaged over the training dataset. Deep Recursive Feature Machines are a method that constructs a neural network by iteratively mapping the data with the AGOP and applying an untrained random feature map. We demonstrate theoretically and empirically that DNC occurs in Deep Recursive Feature Machines as a consequence of the projection with the AGOP matrix computed at each layer. We then provide evidence that this mechanism holds for neural networks more generally. We show that the right singular vectors and values of the weights can be responsible for the majority of within-class variability collapse for DNNs trained in the feature learning regime. As observed in recent work, this singular structure is highly correlated with that of the AGOP.
Related papers
- Deep Learning as Ricci Flow [38.27936710747996]
Deep neural networks (DNNs) are powerful tools for approximating the distribution of complex data.
We show that the transformations performed by DNNs during classification tasks have parallels to those expected under Hamilton's Ricci flow.
Our findings motivate the use of tools from differential and discrete geometry to the problem of explainability in deep learning.
arXiv Detail & Related papers (2024-04-22T15:12:47Z) - Assessing Neural Network Representations During Training Using
Noise-Resilient Diffusion Spectral Entropy [55.014926694758195]
Entropy and mutual information in neural networks provide rich information on the learning process.
We leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures.
We show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data.
arXiv Detail & Related papers (2023-12-04T01:32:42Z) - Do deep neural networks have an inbuilt Occam's razor? [1.1470070927586016]
We show that structured data combined with an intrinsic Occam's razor-like inductive bias towards simple functions counteracts the exponential growth of functions with complexity.
This analysis reveals that structured data, combined with an intrinsic Occam's razor-like inductive bias towards (Kolmogorov) simple functions that is strong enough to counteract the exponential growth of functions with complexity, is a key to the success of DNNs.
arXiv Detail & Related papers (2023-04-13T16:58:21Z) - Neural Collapse in Deep Linear Networks: From Balanced to Imbalanced
Data [12.225207401994737]
We show that complex systems with massive amounts of parameters exhibit the same structural properties when training until convergence.
In particular, it has been observed that the last-layer features collapse to their class-means.
Our results demonstrate the convergence of the last-layer features and classifiers to a geometry consisting of vectors.
arXiv Detail & Related papers (2023-01-01T16:29:56Z) - Variational Inference for Infinitely Deep Neural Networks [0.4061135251278187]
unbounded depth neural network (UDN)
We introduce the unbounded depth neural network (UDN), an infinitely deep probabilistic model that adapts its complexity to the training data.
We study the UDN on real and synthetic data.
arXiv Detail & Related papers (2022-09-21T03:54:34Z) - What Can Be Learnt With Wide Convolutional Neural Networks? [69.55323565255631]
We study infinitely-wide deep CNNs in the kernel regime.
We prove that deep CNNs adapt to the spatial scale of the target function.
We conclude by computing the generalisation error of a deep CNN trained on the output of another deep CNN.
arXiv Detail & Related papers (2022-08-01T17:19:32Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Redundant representations help generalization in wide neural networks [71.38860635025907]
We study the last hidden layer representations of various state-of-the-art convolutional neural networks.
We find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise.
arXiv Detail & Related papers (2021-06-07T10:18:54Z) - Rank-R FNN: A Tensor-Based Learning Model for High-Order Data
Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters.
First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension.
We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z) - Neural Networks and Polynomial Regression. Demystifying the
Overparametrization Phenomena [17.205106391379026]
In the context of neural network models, overparametrization refers to the phenomena whereby these models appear to generalize well on the unseen data.
A conventional explanation of this phenomena is based on self-regularization properties of algorithms used to train the data.
We show that any student network interpolating the data generated by a teacher network generalizes well, provided that the sample size is at least an explicit quantity controlled by data dimension.
arXiv Detail & Related papers (2020-03-23T20:09:31Z) - Dropout: Explicit Forms and Capacity Control [57.36692251815882]
We investigate capacity control provided by dropout in various machine learning problems.
In deep learning, we show that the data-dependent regularizer due to dropout directly controls the Rademacher complexity of the underlying class of deep neural networks.
We evaluate our theoretical findings on real-world datasets, including MovieLens, MNIST, and Fashion-MNIST.
arXiv Detail & Related papers (2020-03-06T19:10:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.