On Provable Benefits of Depth in Training Graph Convolutional Networks
- URL: http://arxiv.org/abs/2110.15174v1
- Date: Thu, 28 Oct 2021 14:50:47 GMT
- Title: On Provable Benefits of Depth in Training Graph Convolutional Networks
- Authors: Weilin Cong, Morteza Ramezani, Mehrdad Mahdavi
- Abstract summary: Graph Convolutional Networks (GCNs) are known to suffer from performance degradation as the number of layers increases.
We argue that there exists a discrepancy between the theoretical understanding of over-smoothing and the practical capabilities of GCNs.
- Score: 13.713485304798368
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Graph Convolutional Networks (GCNs) are known to suffer from performance
degradation as the number of layers increases, which is usually attributed to
over-smoothing. Despite the apparent consensus, we observe that there exists a
discrepancy between the theoretical understanding of over-smoothing and the
practical capabilities of GCNs. Specifically, we argue that over-smoothing does
not necessarily happen in practice, a deeper model is provably expressive, can
converge to global optimum with linear convergence rate, and achieve very high
training accuracy as long as properly trained. Despite being capable of
achieving high training accuracy, empirical results show that the deeper models
generalize poorly on the testing stage and existing theoretical understanding
of such behavior remains elusive. To achieve better understanding, we carefully
analyze the generalization capability of GCNs, and show that the training
strategies to achieve high training accuracy significantly deteriorate the
generalization capability of GCNs. Motivated by these findings, we propose a
decoupled structure for GCNs that detaches weight matrices from feature
propagation to preserve the expressive power and ensure good generalization
performance. We conduct empirical evaluations on various synthetic and
real-world datasets to validate the correctness of our theory.
Related papers
- Deeper Insights into Deep Graph Convolutional Networks: Stability and Generalization [7.523648394276968]
Graph convolutional networks (GCNs) have emerged as powerful models for graph learning tasks.
We study the stability and generalization properties of deep GCNs.
arXiv Detail & Related papers (2024-10-11T02:57:47Z) - On the Generalization Ability of Unsupervised Pretraining [53.06175754026037]
Recent advances in unsupervised learning have shown that unsupervised pre-training, followed by fine-tuning, can improve model generalization.
This paper introduces a novel theoretical framework that illuminates the critical factor influencing the transferability of knowledge acquired during unsupervised pre-training to the subsequent fine-tuning phase.
Our results contribute to a better understanding of unsupervised pre-training and fine-tuning paradigm, and can shed light on the design of more effective pre-training algorithms.
arXiv Detail & Related papers (2024-03-11T16:23:42Z) - Can overfitted deep neural networks in adversarial training generalize?
-- An approximation viewpoint [25.32729343174394]
Adrial training is a widely used method to improve the robustness of deep neural networks (DNNs) over adversarial perturbations.
In this paper, we provide a theoretical understanding of whether overfitted DNNs in adversarial training can generalize from an approximation viewpoint.
arXiv Detail & Related papers (2024-01-24T17:54:55Z) - Towards Demystifying the Generalization Behaviors When Neural Collapse
Emerges [132.62934175555145]
Neural Collapse (NC) is a well-known phenomenon of deep neural networks in the terminal phase of training (TPT)
We propose a theoretical explanation for why continuing training can still lead to accuracy improvement on test set, even after the train accuracy has reached 100%.
We refer to this newly discovered property as "non-conservative generalization"
arXiv Detail & Related papers (2023-10-12T14:29:02Z) - CARE: Certifiably Robust Learning with Reasoning via Variational
Inference [26.210129662748862]
We propose a certifiably robust learning with reasoning pipeline (CARE)
CARE achieves significantly higher certified robustness compared with the state-of-the-art baselines.
We additionally conducted different ablation studies to demonstrate the empirical robustness of CARE and the effectiveness of different knowledge integration.
arXiv Detail & Related papers (2022-09-12T07:15:52Z) - On Feature Learning in Neural Networks with Global Convergence
Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF)
We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF.
We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z) - CAP: Co-Adversarial Perturbation on Weights and Features for Improving
Generalization of Graph Neural Networks [59.692017490560275]
Adversarial training has been widely demonstrated to improve model's robustness against adversarial attacks.
It remains unclear how the adversarial training could improve the generalization abilities of GNNs in the graph analytics problem.
We construct the co-adversarial perturbation (CAP) optimization problem in terms of weights and features, and design the alternating adversarial perturbation algorithm to flatten the weight and feature loss landscapes alternately.
arXiv Detail & Related papers (2021-10-28T02:28:13Z) - Stochastic Training is Not Necessary for Generalization [57.04880404584737]
It is widely believed that the implicit regularization of gradient descent (SGD) is fundamental to the impressive generalization behavior we observe in neural networks.
In this work, we demonstrate that non-stochastic full-batch training can achieve strong performance on CIFAR-10 that is on-par with SGD.
arXiv Detail & Related papers (2021-09-29T00:50:00Z) - Wide Graph Neural Networks: Aggregation Provably Leads to Exponentially
Trainability Loss [17.39060566854841]
Graph convolutional networks (GCNs) and their variants have achieved great success in dealing with graph-structured data.
It is well known that deep GCNs will suffer from over-smoothing problem.
Few theoretical analyses have been conducted to study the expressivity and trainability of deep GCNs.
arXiv Detail & Related papers (2021-03-03T11:06:12Z) - Optimization and Generalization Analysis of Transduction through
Gradient Boosting and Application to Multi-scale Graph Neural Networks [60.22494363676747]
It is known that the current graph neural networks (GNNs) are difficult to make themselves deep due to the problem known as over-smoothing.
Multi-scale GNNs are a promising approach for mitigating the over-smoothing problem.
We derive the optimization and generalization guarantees of transductive learning algorithms that include multi-scale GNNs.
arXiv Detail & Related papers (2020-06-15T17:06:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.