Related papers: Rank Collapse Causes Over-Smoothing and Over-Correlation in Graph Neural Networks

Rank Collapse Causes Over-Smoothing and Over-Correlation in Graph Neural Networks

URL: http://arxiv.org/abs/2308.16800v2
Date: Wed, 21 Feb 2024 08:57:18 GMT
Title: Rank Collapse Causes Over-Smoothing and Over-Correlation in Graph Neural Networks
Authors: Andreas Roth, Thomas Liebig
Abstract summary: Our study reveals new theoretical insights into over-smoothing and feature over-correlation in deep graph neural networks. We show the prevalence of invariant subspaces, demonstrating a fixed relative behavior unaffected by feature transformations. We empirically extend our insights to the non-linear case, demonstrating the inability of existing models to capture linearly independent features.
Score: 4.213427823201119
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Our study reveals new theoretical insights into over-smoothing and feature over-correlation in deep graph neural networks. We show the prevalence of invariant subspaces, demonstrating a fixed relative behavior that is unaffected by feature transformations. Our work clarifies recent observations related to convergence to a constant state and a potential over-separation of node states, as the amplification of subspaces only depends on the spectrum of the aggregation function. In linear scenarios, this leads to node representations being dominated by a low-dimensional subspace with an asymptotic convergence rate independent of the feature transformations. This causes a rank collapse of the node representations, resulting in over-smoothing when smooth vectors span this subspace, and over-correlation even when over-smoothing is avoided. Guided by our theory, we propose a sum of Kronecker products as a beneficial property that can provably prevent over-smoothing, over-correlation, and rank collapse. We empirically extend our insights to the non-linear case, demonstrating the inability of existing models to capture linearly independent features.

Related papers

A Low Rank Neural Representation of Entropy Solutions [0.0]
We construct a new representation of entropy solutions to nonlinear scalar conservation laws with a smooth convex flux function in a single spatial dimension. We show that the low rank neural representation with a fixed number of layers and a small number of coefficients can approximate any entropy solution regardless of the complexity of the shock topology.
arXiv Detail & Related papers (2024-06-09T08:37:11Z)
Unifying Low Dimensional Observations in Deep Learning Through the Deep Linear Unconstrained Feature Model [0.0]
We study low-dimensional structure in the weights, Hessian's, gradients, and feature vectors of deep neural networks. We show how they can be unified within a generalized unconstrained feature model.
arXiv Detail & Related papers (2024-04-09T08:17:32Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
OrthoReg: Improving Graph-regularized MLPs via Orthogonality Regularization [66.30021126251725]
Graph Neural Networks (GNNs) are currently dominating in modeling graphstructure data. Graph-regularized networks (GR-MLPs) implicitly inject the graph structure information into model weights, while their performance can hardly match that of GNNs in most tasks. We show that GR-MLPs suffer from dimensional collapse, a phenomenon in which the largest a few eigenvalues dominate the embedding space. We propose OrthoReg, a novel GR-MLP model to mitigate the dimensional collapse issue.
arXiv Detail & Related papers (2023-01-31T21:20:48Z)
Optimization-Induced Graph Implicit Nonlinear Diffusion [64.39772634635273]
We propose a new kind of graph convolution variants, called Graph Implicit Diffusion (GIND) GIND implicitly has access to infinite hops of neighbors while adaptively aggregating features with nonlinear diffusion to prevent over-smoothing. We show that the learned representation can be formalized as the minimizer of an explicit convex optimization objective.
arXiv Detail & Related papers (2022-06-29T06:26:42Z)
Phenomenology of Double Descent in Finite-Width Neural Networks [29.119232922018732]
Double descent delineates the behaviour of models depending on the regime they belong to. We use influence functions to derive suitable expressions of the population loss and its lower bound. Building on our analysis, we investigate how the loss function affects double descent.
arXiv Detail & Related papers (2022-03-14T17:39:49Z)
Harmless interpolation in regression and classification with structured features [21.064512161584872]
Overparametrized neural networks tend to perfectly fit noisy training data yet generalize well on test data. We present a general and flexible framework for upper bounding regression and classification risk in a reproducing kernel Hilbert space.
arXiv Detail & Related papers (2021-11-09T15:12:26Z)
On Convergence of Training Loss Without Reaching Stationary Points [62.41370821014218]
We show that Neural Network weight variables do not converge to stationary points where the gradient the loss function vanishes. We propose a new perspective based on ergodic theory dynamical systems.
arXiv Detail & Related papers (2021-10-12T18:12:23Z)
Convolutional Filtering and Neural Networks with Non Commutative Algebras [153.20329791008095]
We study the generalization of non commutative convolutional neural networks. We show that non commutative convolutional architectures can be stable to deformations on the space of operators.
arXiv Detail & Related papers (2021-08-23T04:22:58Z)
Implicit Regularization in ReLU Networks with the Square Loss [56.70360094597169]
We show that it is impossible to characterize the implicit regularization with the square loss by any explicit function of the model parameters. Our results suggest that a more general framework may be needed to understand implicit regularization for nonlinear predictors.
arXiv Detail & Related papers (2020-12-09T16:48:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.