Related papers: On Provable Benefits of Depth in Training Graph Convolutional Networks

On Provable Benefits of Depth in Training Graph Convolutional Networks

URL: http://arxiv.org/abs/2110.15174v1
Date: Thu, 28 Oct 2021 14:50:47 GMT
Title: On Provable Benefits of Depth in Training Graph Convolutional Networks
Authors: Weilin Cong, Morteza Ramezani, Mehrdad Mahdavi
Abstract summary: Graph Convolutional Networks (GCNs) are known to suffer from performance degradation as the number of layers increases. We argue that there exists a discrepancy between the theoretical understanding of over-smoothing and the practical capabilities of GCNs.
Score: 13.713485304798368
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Graph Convolutional Networks (GCNs) are known to suffer from performance degradation as the number of layers increases, which is usually attributed to over-smoothing. Despite the apparent consensus, we observe that there exists a discrepancy between the theoretical understanding of over-smoothing and the practical capabilities of GCNs. Specifically, we argue that over-smoothing does not necessarily happen in practice, a deeper model is provably expressive, can converge to global optimum with linear convergence rate, and achieve very high training accuracy as long as properly trained. Despite being capable of achieving high training accuracy, empirical results show that the deeper models generalize poorly on the testing stage and existing theoretical understanding of such behavior remains elusive. To achieve better understanding, we carefully analyze the generalization capability of GCNs, and show that the training strategies to achieve high training accuracy significantly deteriorate the generalization capability of GCNs. Motivated by these findings, we propose a decoupled structure for GCNs that detaches weight matrices from feature propagation to preserve the expressive power and ensure good generalization performance. We conduct empirical evaluations on various synthetic and real-world datasets to validate the correctness of our theory.

Related papers

Adversarial Training for Graph Neural Networks via Graph Subspace Energy Optimization [39.720316105589326]
We propose a new concept of graph subspace energy (GSE) as an indicator of GNN robustness against topology perturbations. An extensive set of experiments shows that AT-GSE outperforms consistently the state-of-the-art GNN adversarial training methods.
arXiv Detail & Related papers (2024-12-25T12:04:18Z)
Deeper Insights into Deep Graph Convolutional Networks: Stability and Generalization [7.523648394276968]
Graph convolutional networks (GCNs) have emerged as powerful models for graph learning tasks. We study the stability and generalization properties of deep GCNs.
arXiv Detail & Related papers (2024-10-11T02:57:47Z)
On the Generalization Ability of Unsupervised Pretraining [53.06175754026037]
Recent advances in unsupervised learning have shown that unsupervised pre-training, followed by fine-tuning, can improve model generalization. This paper introduces a novel theoretical framework that illuminates the critical factor influencing the transferability of knowledge acquired during unsupervised pre-training to the subsequent fine-tuning phase. Our results contribute to a better understanding of unsupervised pre-training and fine-tuning paradigm, and can shed light on the design of more effective pre-training algorithms.
arXiv Detail & Related papers (2024-03-11T16:23:42Z)
Can overfitted deep neural networks in adversarial training generalize? -- An approximation viewpoint [25.32729343174394]
Adrial training is a widely used method to improve the robustness of deep neural networks (DNNs) over adversarial perturbations. In this paper, we provide a theoretical understanding of whether overfitted DNNs in adversarial training can generalize from an approximation viewpoint.
arXiv Detail & Related papers (2024-01-24T17:54:55Z)
Towards Demystifying the Generalization Behaviors When Neural Collapse Emerges [132.62934175555145]
Neural Collapse (NC) is a well-known phenomenon of deep neural networks in the terminal phase of training (TPT) We propose a theoretical explanation for why continuing training can still lead to accuracy improvement on test set, even after the train accuracy has reached 100%. We refer to this newly discovered property as "non-conservative generalization"
arXiv Detail & Related papers (2023-10-12T14:29:02Z)
CARE: Certifiably Robust Learning with Reasoning via Variational Inference [26.210129662748862]
We propose a certifiably robust learning with reasoning pipeline (CARE) CARE achieves significantly higher certified robustness compared with the state-of-the-art baselines. We additionally conducted different ablation studies to demonstrate the empirical robustness of CARE and the effectiveness of different knowledge integration.
arXiv Detail & Related papers (2022-09-12T07:15:52Z)
On Feature Learning in Neural Networks with Global Convergence Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF) We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF. We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z)
CAP: Co-Adversarial Perturbation on Weights and Features for Improving Generalization of Graph Neural Networks [59.692017490560275]
Adversarial training has been widely demonstrated to improve model's robustness against adversarial attacks. It remains unclear how the adversarial training could improve the generalization abilities of GNNs in the graph analytics problem. We construct the co-adversarial perturbation (CAP) optimization problem in terms of weights and features, and design the alternating adversarial perturbation algorithm to flatten the weight and feature loss landscapes alternately.
arXiv Detail & Related papers (2021-10-28T02:28:13Z)
Stochastic Training is Not Necessary for Generalization [57.04880404584737]
It is widely believed that the implicit regularization of gradient descent (SGD) is fundamental to the impressive generalization behavior we observe in neural networks. In this work, we demonstrate that non-stochastic full-batch training can achieve strong performance on CIFAR-10 that is on-par with SGD.
arXiv Detail & Related papers (2021-09-29T00:50:00Z)
Wide Graph Neural Networks: Aggregation Provably Leads to Exponentially Trainability Loss [17.39060566854841]
Graph convolutional networks (GCNs) and their variants have achieved great success in dealing with graph-structured data. It is well known that deep GCNs will suffer from over-smoothing problem. Few theoretical analyses have been conducted to study the expressivity and trainability of deep GCNs.
arXiv Detail & Related papers (2021-03-03T11:06:12Z)
Optimization and Generalization Analysis of Transduction through Gradient Boosting and Application to Multi-scale Graph Neural Networks [60.22494363676747]
It is known that the current graph neural networks (GNNs) are difficult to make themselves deep due to the problem known as over-smoothing. Multi-scale GNNs are a promising approach for mitigating the over-smoothing problem. We derive the optimization and generalization guarantees of transductive learning algorithms that include multi-scale GNNs.
arXiv Detail & Related papers (2020-06-15T17:06:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.