Nonlinear spiked covariance matrices and signal propagation in deep
neural networks
- URL: http://arxiv.org/abs/2402.10127v1
- Date: Thu, 15 Feb 2024 17:31:19 GMT
- Title: Nonlinear spiked covariance matrices and signal propagation in deep
neural networks
- Authors: Zhichao Wang, Denny Wu, Zhou Fan
- Abstract summary: We study the eigenvalue spectrum of the Conjugate Kernel defined by a nonlinear feature map of a feedforward neural network.
In this work, we characterize these signal eigenvalues and eigenvectors for a nonlinear version of the spiked covariance model.
We also study a simple regime of representation learning where the weight matrix develops a rank-one signal component over training.
- Score: 22.84097371842279
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many recent works have studied the eigenvalue spectrum of the Conjugate
Kernel (CK) defined by the nonlinear feature map of a feedforward neural
network. However, existing results only establish weak convergence of the
empirical eigenvalue distribution, and fall short of providing precise
quantitative characterizations of the ''spike'' eigenvalues and eigenvectors
that often capture the low-dimensional signal structure of the learning
problem. In this work, we characterize these signal eigenvalues and
eigenvectors for a nonlinear version of the spiked covariance model, including
the CK as a special case. Using this general result, we give a quantitative
description of how spiked eigenstructure in the input data propagates through
the hidden layers of a neural network with random weights. As a second
application, we study a simple regime of representation learning where the
weight matrix develops a rank-one signal component over training and
characterize the alignment of the target function with the spike eigenvector of
the CK on test data.
Related papers
- Generalization for Least Squares Regression With Simple Spiked Covariances [3.9134031118910264]
The generalization properties of even two-layer neural networks trained by gradient descent remain poorly understood.
Recent work has made progress by describing the spectrum of the feature matrix at the hidden layer.
Yet, the generalization error for linear models with spiked covariances has not been previously determined.
arXiv Detail & Related papers (2024-10-17T19:46:51Z) - Improving Expressive Power of Spectral Graph Neural Networks with Eigenvalue Correction [55.57072563835959]
spectral graph neural networks are characterized by filters.
We propose an eigenvalue correction strategy that can free filters from the constraints of repeated eigenvalue inputs.
arXiv Detail & Related papers (2024-01-28T08:12:00Z) - Non Commutative Convolutional Signal Models in Neural Networks:
Stability to Small Deformations [111.27636893711055]
We study the filtering and stability properties of non commutative convolutional filters.
Our results have direct implications for group neural networks, multigraph neural networks and quaternion neural networks.
arXiv Detail & Related papers (2023-10-05T20:27:22Z) - A theory of data variability in Neural Network Bayesian inference [0.70224924046445]
We provide a field-theoretic formalism which covers the generalization properties of infinitely wide networks.
We derive the generalization properties from the statistical properties of the input.
We show that data variability leads to a non-Gaussian action reminiscent of a ($varphi3+varphi4$)-theory.
arXiv Detail & Related papers (2023-07-31T14:11:32Z) - Learning Linear Causal Representations from Interventions under General
Nonlinear Mixing [52.66151568785088]
We prove strong identifiability results given unknown single-node interventions without access to the intervention targets.
This is the first instance of causal identifiability from non-paired interventions for deep neural network embeddings.
arXiv Detail & Related papers (2023-06-04T02:32:12Z) - Capturing dynamical correlations using implicit neural representations [85.66456606776552]
We develop an artificial intelligence framework which combines a neural network trained to mimic simulated data from a model Hamiltonian with automatic differentiation to recover unknown parameters from experimental data.
In doing so, we illustrate the ability to build and train a differentiable model only once, which then can be applied in real-time to multi-dimensional scattering data.
arXiv Detail & Related papers (2023-04-08T07:55:36Z) - Convolutional Filtering and Neural Networks with Non Commutative
Algebras [153.20329791008095]
We study the generalization of non commutative convolutional neural networks.
We show that non commutative convolutional architectures can be stable to deformations on the space of operators.
arXiv Detail & Related papers (2021-08-23T04:22:58Z) - Linear approximability of two-layer neural networks: A comprehensive
analysis based on spectral decay [4.042159113348107]
We first consider the case of single neuron and show that the linear approximability, quantified by the Kolmogorov width, is controlled by the eigenvalue decay of an associate kernel.
We show that similar results also hold for two-layer neural networks.
arXiv Detail & Related papers (2021-08-10T23:30:29Z) - A simpler spectral approach for clustering in directed networks [1.52292571922932]
We show that using the eigenvalue/eigenvector decomposition of the adjacency matrix is simpler than all common methods.
We provide numerical evidence for the superiority of the Gaussian Mixture clustering over the widely used k-means algorithm.
arXiv Detail & Related papers (2021-02-05T14:16:45Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z) - Spectra of the Conjugate Kernel and Neural Tangent Kernel for
linear-width neural networks [22.57374777395746]
We study the eigenvalue of the Conjugate Neural Kernel and Tangent Kernel associated to feedforward neural networks.
We show that the eigenvalue distributions of the CK and NTK converge to deterministic limits.
arXiv Detail & Related papers (2020-05-25T01:11:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.