How Neural Networks Extrapolate: From Feedforward to Graph Neural
Networks
- URL: http://arxiv.org/abs/2009.11848v5
- Date: Tue, 2 Mar 2021 23:05:49 GMT
- Title: How Neural Networks Extrapolate: From Feedforward to Graph Neural
Networks
- Authors: Keyulu Xu, Mozhi Zhang, Jingling Li, Simon S. Du, Ken-ichi
Kawarabayashi, Stefanie Jegelka
- Abstract summary: We study how neural networks trained by gradient descent extrapolate what they learn outside the support of the training distribution.
Graph Neural Networks (GNNs) have shown some success in more complex tasks.
- Score: 80.55378250013496
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study how neural networks trained by gradient descent extrapolate, i.e.,
what they learn outside the support of the training distribution. Previous
works report mixed empirical results when extrapolating with neural networks:
while feedforward neural networks, a.k.a. multilayer perceptrons (MLPs), do not
extrapolate well in certain simple tasks, Graph Neural Networks (GNNs) --
structured networks with MLP modules -- have shown some success in more complex
tasks. Working towards a theoretical explanation, we identify conditions under
which MLPs and GNNs extrapolate well. First, we quantify the observation that
ReLU MLPs quickly converge to linear functions along any direction from the
origin, which implies that ReLU MLPs do not extrapolate most nonlinear
functions. But, they can provably learn a linear target function when the
training distribution is sufficiently "diverse". Second, in connection to
analyzing the successes and limitations of GNNs, these results suggest a
hypothesis for which we provide theoretical and empirical evidence: the success
of GNNs in extrapolating algorithmic tasks to new data (e.g., larger graphs or
edge weights) relies on encoding task-specific non-linearities in the
architecture or features. Our theoretical analysis builds on a connection of
over-parameterized networks to the neural tangent kernel. Empirically, our
theory holds across different training settings.
Related papers
- Graph Neural Networks Provably Benefit from Structural Information: A
Feature Learning Perspective [53.999128831324576]
Graph neural networks (GNNs) have pioneered advancements in graph representation learning.
This study investigates the role of graph convolution within the context of feature learning theory.
arXiv Detail & Related papers (2023-06-24T10:21:11Z) - Graph Neural Networks are Inherently Good Generalizers: Insights by
Bridging GNNs and MLPs [71.93227401463199]
This paper pinpoints the major source of GNNs' performance gain to their intrinsic capability, by introducing an intermediate model class dubbed as P(ropagational)MLP.
We observe that PMLPs consistently perform on par with (or even exceed) their GNN counterparts, while being much more efficient in training.
arXiv Detail & Related papers (2022-12-18T08:17:32Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - Optimal Learning Rates of Deep Convolutional Neural Networks: Additive
Ridge Functions [19.762318115851617]
We consider the mean squared error analysis for deep convolutional neural networks.
We show that, for additive ridge functions, convolutional neural networks followed by one fully connected layer with ReLU activation functions can reach optimal mini-max rates.
arXiv Detail & Related papers (2022-02-24T14:22:32Z) - Analytic Learning of Convolutional Neural Network For Pattern
Recognition [20.916630175697065]
Training convolutional neural networks (CNNs) with back-propagation (BP) is time-consuming and resource-intensive.
We propose an analytic convolutional neural network learning (ACnnL)
ACnnL builds a closed-form solution similar to its counterpart, but differs in their regularization constraints.
arXiv Detail & Related papers (2022-02-14T06:32:21Z) - Optimization of Graph Neural Networks: Implicit Acceleration by Skip
Connections and More Depth [57.10183643449905]
Graph Neural Networks (GNNs) have been studied from the lens of expressive power and generalization.
We study the dynamics of GNNs by studying deep skip optimization.
Our results provide first theoretical support for the success of GNNs.
arXiv Detail & Related papers (2021-05-10T17:59:01Z) - Statistical Mechanics of Deep Linear Neural Networks: The
Back-Propagating Renormalization Group [4.56877715768796]
We study the statistical mechanics of learning in Deep Linear Neural Networks (DLNNs) in which the input-output function of an individual unit is linear.
We solve exactly the network properties following supervised learning using an equilibrium Gibbs distribution in the weight space.
Our numerical simulations reveal that despite the nonlinearity, the predictions of our theory are largely shared by ReLU networks with modest depth.
arXiv Detail & Related papers (2020-12-07T20:08:31Z) - Fast Learning of Graph Neural Networks with Guaranteed Generalizability:
One-hidden-layer Case [93.37576644429578]
Graph neural networks (GNNs) have made great progress recently on learning from graph-structured data in practice.
We provide a theoretically-grounded generalizability analysis of GNNs with one hidden layer for both regression and binary classification problems.
arXiv Detail & Related papers (2020-06-25T00:45:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.