Residual Connections and Normalization Can Provably Prevent Oversmoothing in GNNs
- URL: http://arxiv.org/abs/2406.02997v2
- Date: Wed, 12 Jun 2024 09:06:10 GMT
- Title: Residual Connections and Normalization Can Provably Prevent Oversmoothing in GNNs
- Authors: Michael Scholkemper, Xinyi Wu, Ali Jadbabaie, Michael T. Schaub,
- Abstract summary: We provide a formal and precise characterization of (linearized) graph neural networks (GNNs) with residual connections and normalization layers.
We show that the centering step of a normalization layer alters the graph signal in message-passing in such a way that relevant information can become harder to extract.
We introduce a novel, principled normalization layer called GraphNormv2 in which the centering step is learned such that it does not distort the original graph signal in an undesirable way.
- Score: 30.003409099607204
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Residual connections and normalization layers have become standard design choices for graph neural networks (GNNs), and were proposed as solutions to the mitigate the oversmoothing problem in GNNs. However, how exactly these methods help alleviate the oversmoothing problem from a theoretical perspective is not well understood. In this work, we provide a formal and precise characterization of (linearized) GNNs with residual connections and normalization layers. We establish that (a) for residual connections, the incorporation of the initial features at each layer can prevent the signal from becoming too smooth, and determines the subspace of possible node representations; (b) batch normalization prevents a complete collapse of the output embedding space to a one-dimensional subspace through the individual rescaling of each column of the feature matrix. This results in the convergence of node representations to the top-$k$ eigenspace of the message-passing operator; (c) moreover, we show that the centering step of a normalization layer -- which can be understood as a projection -- alters the graph signal in message-passing in such a way that relevant information can become harder to extract. We therefore introduce a novel, principled normalization layer called GraphNormv2 in which the centering step is learned such that it does not distort the original graph signal in an undesirable way. Experimental results confirm the effectiveness of our method.
Related papers
- Scalable Graph Compressed Convolutions [68.85227170390864]
We propose a differentiable method that applies permutations to calibrate input graphs for Euclidean convolution.
Based on the graph calibration, we propose the Compressed Convolution Network (CoCN) for hierarchical graph representation learning.
arXiv Detail & Related papers (2024-07-26T03:14:13Z) - Towards Training Without Depth Limits: Batch Normalization Without
Gradient Explosion [83.90492831583997]
We show that a batch-normalized network can keep the optimal signal propagation properties, but avoid exploding gradients in depth.
We use a Multi-Layer Perceptron (MLP) with linear activations and batch-normalization that provably has bounded depth.
We also design an activation shaping scheme that empirically achieves the same properties for certain non-linear activations.
arXiv Detail & Related papers (2023-10-03T12:35:02Z) - Implicit Graph Neural Diffusion Networks: Convergence, Generalization,
and Over-Smoothing [7.984586585987328]
Implicit Graph Neural Networks (GNNs) have achieved significant success in addressing graph learning problems.
We introduce a geometric framework for designing implicit graph diffusion layers based on a parameterized graph Laplacian operator.
We show how implicit GNN layers can be viewed as the fixed-point equation of a Dirichlet energy minimization problem.
arXiv Detail & Related papers (2023-08-07T05:22:33Z) - Deep Graph Neural Networks via Posteriori-Sampling-based Node-Adaptive Residual Module [65.81781176362848]
Graph Neural Networks (GNNs) can learn from graph-structured data through neighborhood information aggregation.
As the number of layers increases, node representations become indistinguishable, which is known as over-smoothing.
We propose a textbfPosterior-Sampling-based, Node-distinguish Residual module (PSNR).
arXiv Detail & Related papers (2023-05-09T12:03:42Z) - Distributional Signals for Node Classification in Graph Neural Networks [36.30743671968087]
In graph neural networks (GNNs) both node features and labels are examples of graph signals, a key notion in graph signal processing (GSP)
In our framework, we work with the distributions of node labels instead of their values and propose notions of smoothness and non-uniformity of such distributional graph signals.
We then propose a general regularization method for GNNs that allows us to encode distributional smoothness and non-uniformity of the model output in semi-supervised node classification tasks.
arXiv Detail & Related papers (2023-04-07T06:54:42Z) - ResNorm: Tackling Long-tailed Degree Distribution Issue in Graph Neural
Networks via Normalization [80.90206641975375]
This paper focuses on improving the performance of GNNs via normalization.
By studying the long-tailed distribution of node degrees in the graph, we propose a novel normalization method for GNNs.
The $scale$ operation of ResNorm reshapes the node-wise standard deviation (NStd) distribution so as to improve the accuracy of tail nodes.
arXiv Detail & Related papers (2022-06-16T13:49:09Z) - Can Hybrid Geometric Scattering Networks Help Solve the Maximal Clique
Problem? [12.953897563456271]
We propose a geometric scattering-based graph neural network (GNN) for approximating solutions of the NP-hard maximal clique (MC) problem.
Our empirical results demonstrate that our method outperforms representative GNN baselines in terms of solution accuracy and inference speed.
arXiv Detail & Related papers (2022-06-03T11:09:03Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Nonlinear State-Space Generalizations of Graph Convolutional Neural
Networks [172.18295279061607]
Graph convolutional neural networks (GCNNs) learn compositional representations from network data by nesting linear graph convolutions into nonlinearities.
In this work, we approach GCNNs from a state-space perspective revealing that the graph convolutional module is a minimalistic linear state-space model.
We show that this state update may be problematic because it is nonparametric, and depending on the graph spectrum it may explode or vanish.
We propose a novel family of nodal aggregation rules that aggregate node features within a layer in a nonlinear state-space parametric fashion allowing for a better trade-off.
arXiv Detail & Related papers (2020-10-27T19:48:56Z) - Graphical Normalizing Flows [11.23030807455021]
Normalizing flows model complex probability distributions by combining a base distribution with a series of neural networks.
State-of-the-art architectures rely on coupling and autoregressive transformations to lift up invertible functions from scalars to vectors.
We propose the graphical normalizing flow, a new invertible transformation with either a prescribed or a learnable graphical structure.
arXiv Detail & Related papers (2020-06-03T21:50:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.