Old can be Gold: Better Gradient Flow can Make Vanilla-GCNs Great Again
- URL: http://arxiv.org/abs/2210.08122v1
- Date: Fri, 14 Oct 2022 21:30:25 GMT
- Title: Old can be Gold: Better Gradient Flow can Make Vanilla-GCNs Great Again
- Authors: Ajay Jaiswal, Peihao Wang, Tianlong Chen, Justin F. Rousseau, Ying
Ding, Zhangyang Wang
- Abstract summary: We provide a new perspective of gradient flow to understand the substandard performance of deep GCNs.
We propose to use gradient-guided dynamic rewiring of vanilla-GCNs with skip connections.
Our methods significantly boost their performance to comfortably compete and outperform many fancy state-of-the-art methods.
- Score: 96.4999517230259
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the enormous success of Graph Convolutional Networks (GCNs) in
modeling graph-structured data, most of the current GCNs are shallow due to the
notoriously challenging problems of over-smoothening and information squashing
along with conventional difficulty caused by vanishing gradients and
over-fitting. Previous works have been primarily focused on the study of
over-smoothening and over-squashing phenomena in training deep GCNs.
Surprisingly, in comparison with CNNs/RNNs, very limited attention has been
given to understanding how healthy gradient flow can benefit the trainability
of deep GCNs. In this paper, firstly, we provide a new perspective of gradient
flow to understand the substandard performance of deep GCNs and hypothesize
that by facilitating healthy gradient flow, we can significantly improve their
trainability, as well as achieve state-of-the-art (SOTA) level performance from
vanilla-GCNs. Next, we argue that blindly adopting the Glorot initialization
for GCNs is not optimal, and derive a topology-aware isometric initialization
scheme for vanilla-GCNs based on the principles of isometry. Additionally,
contrary to ad-hoc addition of skip-connections, we propose to use
gradient-guided dynamic rewiring of vanilla-GCNs} with skip connections. Our
dynamic rewiring method uses the gradient flow within each layer during
training to introduce on-demand skip-connections adaptively. We provide
extensive empirical evidence across multiple datasets that our methods improve
gradient flow in deep vanilla-GCNs and significantly boost their performance to
comfortably compete and outperform many fancy state-of-the-art methods. Codes
are available at: https://github.com/VITA-Group/GradientGCN.
Related papers
- Graph Neural Networks Do Not Always Oversmooth [46.57665708260211]
We study oversmoothing in graph convolutional networks (GCNs) by using their Gaussian process (GP) equivalence in the limit of infinitely many hidden features.
We identify a new, non-oversmoothing phase: if the initial weights of the network have sufficiently large variance, GCNs do not oversmooth, and node features remain informative even at large depth.
arXiv Detail & Related papers (2024-06-04T12:47:13Z) - New Insights into Graph Convolutional Networks using Neural Tangent
Kernels [8.824340350342512]
This paper focuses on semi-supervised learning on graphs, and explains the above observations through the lens of Neural Tangent Kernels (NTKs)
We derive NTKs corresponding to infinitely wide GCNs (with and without skip connections)
We use the derived NTKs to identify that, with suitable normalisation, network depth does not always drastically reduce the performance of GCNs.
arXiv Detail & Related papers (2021-10-08T15:36:52Z) - Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive
Benchmark Study [100.27567794045045]
Training deep graph neural networks (GNNs) is notoriously hard.
We present the first fair and reproducible benchmark dedicated to assessing the "tricks" of training deep GNNs.
arXiv Detail & Related papers (2021-08-24T05:00:37Z) - Automated Graph Learning via Population Based Self-Tuning GCN [45.28411311903644]
Graph convolutional network (GCN) and its variants have been successfully applied to a broad range of tasks.
Traditional GCN models suffer from the issues of overfitting and oversmoothing.
Recent techniques like DropEdge could alleviate these issues and thus enable the development of deep GCN.
arXiv Detail & Related papers (2021-07-09T23:05:21Z) - Dissecting the Diffusion Process in Linear Graph Convolutional Networks [71.30132908130581]
Graph Convolutional Networks (GCNs) have attracted more and more attention in recent years.
Recent works show that a linear GCN can achieve comparable performance to the original non-linear GCN.
We propose Decoupled Graph Convolution (DGC) that decouples the terminal time and the feature propagation steps.
arXiv Detail & Related papers (2021-02-22T02:45:59Z) - DeeperGCN: All You Need to Train Deeper GCNs [66.64739331859226]
Graph Convolutional Networks (GCNs) have been drawing significant attention with the power of representation learning on graphs.
Unlike Convolutional Neural Networks (CNNs), which are able to take advantage of stacking very deep layers, GCNs suffer from vanishing gradient, over-smoothing and over-fitting issues when going deeper.
This paper proposes DeeperGCN that is capable of successfully and reliably training very deep GCNs.
arXiv Detail & Related papers (2020-06-13T23:00:22Z) - Gradient Centralization: A New Optimization Technique for Deep Neural
Networks [74.935141515523]
gradient centralization (GC) operates directly on gradients by centralizing the gradient vectors to have zero mean.
GC can be viewed as a projected gradient descent method with a constrained loss function.
GC is very simple to implement and can be easily embedded into existing gradient based DNNs with only one line of code.
arXiv Detail & Related papers (2020-04-03T10:25:00Z) - LightGCN: Simplifying and Powering Graph Convolution Network for
Recommendation [100.76229017056181]
Graph Convolution Network (GCN) has become new state-of-the-art for collaborative filtering.
In this work, we aim to simplify the design of GCN to make it more concise and appropriate for recommendation.
We propose a new model named LightGCN, including only the most essential component in GCN -- neighborhood aggregation.
arXiv Detail & Related papers (2020-02-06T06:53:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.