Related papers: Old can be Gold: Better Gradient Flow can Make Vanilla-GCNs Great Again

Old can be Gold: Better Gradient Flow can Make Vanilla-GCNs Great Again

URL: http://arxiv.org/abs/2210.08122v1
Date: Fri, 14 Oct 2022 21:30:25 GMT
Title: Old can be Gold: Better Gradient Flow can Make Vanilla-GCNs Great Again
Authors: Ajay Jaiswal, Peihao Wang, Tianlong Chen, Justin F. Rousseau, Ying Ding, Zhangyang Wang
Abstract summary: We provide a new perspective of gradient flow to understand the substandard performance of deep GCNs. We propose to use gradient-guided dynamic rewiring of vanilla-GCNs with skip connections. Our methods significantly boost their performance to comfortably compete and outperform many fancy state-of-the-art methods.
Score: 96.4999517230259
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite the enormous success of Graph Convolutional Networks (GCNs) in modeling graph-structured data, most of the current GCNs are shallow due to the notoriously challenging problems of over-smoothening and information squashing along with conventional difficulty caused by vanishing gradients and over-fitting. Previous works have been primarily focused on the study of over-smoothening and over-squashing phenomena in training deep GCNs. Surprisingly, in comparison with CNNs/RNNs, very limited attention has been given to understanding how healthy gradient flow can benefit the trainability of deep GCNs. In this paper, firstly, we provide a new perspective of gradient flow to understand the substandard performance of deep GCNs and hypothesize that by facilitating healthy gradient flow, we can significantly improve their trainability, as well as achieve state-of-the-art (SOTA) level performance from vanilla-GCNs. Next, we argue that blindly adopting the Glorot initialization for GCNs is not optimal, and derive a topology-aware isometric initialization scheme for vanilla-GCNs based on the principles of isometry. Additionally, contrary to ad-hoc addition of skip-connections, we propose to use gradient-guided dynamic rewiring of vanilla-GCNs} with skip connections. Our dynamic rewiring method uses the gradient flow within each layer during training to introduce on-demand skip-connections adaptively. We provide extensive empirical evidence across multiple datasets that our methods improve gradient flow in deep vanilla-GCNs and significantly boost their performance to comfortably compete and outperform many fancy state-of-the-art methods. Codes are available at: https://github.com/VITA-Group/GradientGCN.

Related papers

Fast and Slow Gradient Approximation for Binary Neural Network Optimization [11.064044986709733]
hypernetwork based methods utilize neural networks to learn the gradients of non-differentiable quantization functions. We propose a Historical Gradient Storage (HGS) module, which models the historical gradient sequence to generate the first-order momentum required for optimization. We also introduce Layer Recognition Embeddings (LRE) into the hypernetwork, facilitating the generation of layer-specific fine gradients.
arXiv Detail & Related papers (2024-12-16T13:48:40Z)
Graph Neural Networks Do Not Always Oversmooth [46.57665708260211]
We study oversmoothing in graph convolutional networks (GCNs) by using their Gaussian process (GP) equivalence in the limit of infinitely many hidden features. We identify a new, non-oversmoothing phase: if the initial weights of the network have sufficiently large variance, GCNs do not oversmooth, and node features remain informative even at large depth.
arXiv Detail & Related papers (2024-06-04T12:47:13Z)
New Insights into Graph Convolutional Networks using Neural Tangent Kernels [8.824340350342512]
This paper focuses on semi-supervised learning on graphs, and explains the above observations through the lens of Neural Tangent Kernels (NTKs) We derive NTKs corresponding to infinitely wide GCNs (with and without skip connections) We use the derived NTKs to identify that, with suitable normalisation, network depth does not always drastically reduce the performance of GCNs.
arXiv Detail & Related papers (2021-10-08T15:36:52Z)
Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive Benchmark Study [100.27567794045045]
Training deep graph neural networks (GNNs) is notoriously hard. We present the first fair and reproducible benchmark dedicated to assessing the "tricks" of training deep GNNs.
arXiv Detail & Related papers (2021-08-24T05:00:37Z)
Automated Graph Learning via Population Based Self-Tuning GCN [45.28411311903644]
Graph convolutional network (GCN) and its variants have been successfully applied to a broad range of tasks. Traditional GCN models suffer from the issues of overfitting and oversmoothing. Recent techniques like DropEdge could alleviate these issues and thus enable the development of deep GCN.
arXiv Detail & Related papers (2021-07-09T23:05:21Z)
Dissecting the Diffusion Process in Linear Graph Convolutional Networks [71.30132908130581]
Graph Convolutional Networks (GCNs) have attracted more and more attention in recent years. Recent works show that a linear GCN can achieve comparable performance to the original non-linear GCN. We propose Decoupled Graph Convolution (DGC) that decouples the terminal time and the feature propagation steps.
arXiv Detail & Related papers (2021-02-22T02:45:59Z)
DeeperGCN: All You Need to Train Deeper GCNs [66.64739331859226]
Graph Convolutional Networks (GCNs) have been drawing significant attention with the power of representation learning on graphs. Unlike Convolutional Neural Networks (CNNs), which are able to take advantage of stacking very deep layers, GCNs suffer from vanishing gradient, over-smoothing and over-fitting issues when going deeper. This paper proposes DeeperGCN that is capable of successfully and reliably training very deep GCNs.
arXiv Detail & Related papers (2020-06-13T23:00:22Z)
Gradient Centralization: A New Optimization Technique for Deep Neural Networks [74.935141515523]
gradient centralization (GC) operates directly on gradients by centralizing the gradient vectors to have zero mean. GC can be viewed as a projected gradient descent method with a constrained loss function. GC is very simple to implement and can be easily embedded into existing gradient based DNNs with only one line of code.
arXiv Detail & Related papers (2020-04-03T10:25:00Z)
LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation [100.76229017056181]
Graph Convolution Network (GCN) has become new state-of-the-art for collaborative filtering. In this work, we aim to simplify the design of GCN to make it more concise and appropriate for recommendation. We propose a new model named LightGCN, including only the most essential component in GCN -- neighborhood aggregation.
arXiv Detail & Related papers (2020-02-06T06:53:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.