Rank Collapse Causes Over-Smoothing and Over-Correlation in Graph Neural
Networks
- URL: http://arxiv.org/abs/2308.16800v2
- Date: Wed, 21 Feb 2024 08:57:18 GMT
- Title: Rank Collapse Causes Over-Smoothing and Over-Correlation in Graph Neural
Networks
- Authors: Andreas Roth, Thomas Liebig
- Abstract summary: Our study reveals new theoretical insights into over-smoothing and feature over-correlation in deep graph neural networks.
We show the prevalence of invariant subspaces, demonstrating a fixed relative behavior unaffected by feature transformations.
We empirically extend our insights to the non-linear case, demonstrating the inability of existing models to capture linearly independent features.
- Score: 4.213427823201119
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Our study reveals new theoretical insights into over-smoothing and feature
over-correlation in deep graph neural networks. We show the prevalence of
invariant subspaces, demonstrating a fixed relative behavior that is unaffected
by feature transformations. Our work clarifies recent observations related to
convergence to a constant state and a potential over-separation of node states,
as the amplification of subspaces only depends on the spectrum of the
aggregation function. In linear scenarios, this leads to node representations
being dominated by a low-dimensional subspace with an asymptotic convergence
rate independent of the feature transformations. This causes a rank collapse of
the node representations, resulting in over-smoothing when smooth vectors span
this subspace, and over-correlation even when over-smoothing is avoided. Guided
by our theory, we propose a sum of Kronecker products as a beneficial property
that can provably prevent over-smoothing, over-correlation, and rank collapse.
We empirically extend our insights to the non-linear case, demonstrating the
inability of existing models to capture linearly independent features.
Related papers
- A Low Rank Neural Representation of Entropy Solutions [0.0]
We construct a new representation of entropy solutions to nonlinear scalar conservation laws with a smooth convex flux function in a single spatial dimension.
We show that the low rank neural representation with a fixed number of layers and a small number of coefficients can approximate any entropy solution regardless of the complexity of the shock topology.
arXiv Detail & Related papers (2024-06-09T08:37:11Z) - Unifying Low Dimensional Observations in Deep Learning Through the Deep Linear Unconstrained Feature Model [0.0]
We study low-dimensional structure in the weights, Hessian's, gradients, and feature vectors of deep neural networks.
We show how they can be unified within a generalized unconstrained feature model.
arXiv Detail & Related papers (2024-04-09T08:17:32Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - OrthoReg: Improving Graph-regularized MLPs via Orthogonality
Regularization [66.30021126251725]
Graph Neural Networks (GNNs) are currently dominating in modeling graphstructure data.
Graph-regularized networks (GR-MLPs) implicitly inject the graph structure information into model weights, while their performance can hardly match that of GNNs in most tasks.
We show that GR-MLPs suffer from dimensional collapse, a phenomenon in which the largest a few eigenvalues dominate the embedding space.
We propose OrthoReg, a novel GR-MLP model to mitigate the dimensional collapse issue.
arXiv Detail & Related papers (2023-01-31T21:20:48Z) - Optimization-Induced Graph Implicit Nonlinear Diffusion [64.39772634635273]
We propose a new kind of graph convolution variants, called Graph Implicit Diffusion (GIND)
GIND implicitly has access to infinite hops of neighbors while adaptively aggregating features with nonlinear diffusion to prevent over-smoothing.
We show that the learned representation can be formalized as the minimizer of an explicit convex optimization objective.
arXiv Detail & Related papers (2022-06-29T06:26:42Z) - Phenomenology of Double Descent in Finite-Width Neural Networks [29.119232922018732]
Double descent delineates the behaviour of models depending on the regime they belong to.
We use influence functions to derive suitable expressions of the population loss and its lower bound.
Building on our analysis, we investigate how the loss function affects double descent.
arXiv Detail & Related papers (2022-03-14T17:39:49Z) - Harmless interpolation in regression and classification with structured
features [21.064512161584872]
Overparametrized neural networks tend to perfectly fit noisy training data yet generalize well on test data.
We present a general and flexible framework for upper bounding regression and classification risk in a reproducing kernel Hilbert space.
arXiv Detail & Related papers (2021-11-09T15:12:26Z) - On Convergence of Training Loss Without Reaching Stationary Points [62.41370821014218]
We show that Neural Network weight variables do not converge to stationary points where the gradient the loss function vanishes.
We propose a new perspective based on ergodic theory dynamical systems.
arXiv Detail & Related papers (2021-10-12T18:12:23Z) - Convolutional Filtering and Neural Networks with Non Commutative
Algebras [153.20329791008095]
We study the generalization of non commutative convolutional neural networks.
We show that non commutative convolutional architectures can be stable to deformations on the space of operators.
arXiv Detail & Related papers (2021-08-23T04:22:58Z) - Implicit Regularization in ReLU Networks with the Square Loss [56.70360094597169]
We show that it is impossible to characterize the implicit regularization with the square loss by any explicit function of the model parameters.
Our results suggest that a more general framework may be needed to understand implicit regularization for nonlinear predictors.
arXiv Detail & Related papers (2020-12-09T16:48:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.