Supercharging Graph Transformers with Advective Diffusion
- URL: http://arxiv.org/abs/2310.06417v3
- Date: Sat, 31 May 2025 17:50:56 GMT
- Title: Supercharging Graph Transformers with Advective Diffusion
- Authors: Qitian Wu, Chenxiao Yang, Kaipeng Zeng, Michael Bronstein,
- Abstract summary: This paper proposes AdvDIFFormer, a physics-inspired graph Transformer model designed to address this challenge.<n>We show that AdvDIFFormer has provable capability for controlling generalization error with topological shifts.<n> Empirically, the model demonstrates superiority in various predictive tasks across information networks, molecular screening and protein interactions.
- Score: 28.40109111316014
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The capability of generalization is a cornerstone for the success of modern learning systems. For non-Euclidean data, e.g., graphs, that particularly involves topological structures, one important aspect neglected by prior studies is how machine learning models generalize under topological shifts. This paper proposes AdvDIFFormer, a physics-inspired graph Transformer model designed to address this challenge. The model is derived from advective diffusion equations which describe a class of continuous message passing process with observed and latent topological structures. We show that AdvDIFFormer has provable capability for controlling generalization error with topological shifts, which in contrast cannot be guaranteed by graph diffusion models. Empirically, the model demonstrates superiority in various predictive tasks across information networks, molecular screening and protein interactions.
Related papers
- Do Graph Diffusion Models Accurately Capture and Generate Substructure Distributions? [28.19526635775658]
Diffusion models do not possess universal expressivity to accurately model the distribution scores of complex graph data.
Our work addresses this limitation by focusing on the frequency of specific substructures as a key characteristic of target graph distributions.
We establish a theoretical connection between the expressivity of Graph Neural Networks (GNNs) and the overall performance of graph diffusion models.
arXiv Detail & Related papers (2025-02-04T17:04:16Z) - Transformers from Diffusion: A Unified Framework for Neural Message Passing [79.9193447649011]
Message passing neural networks (MPNNs) have become a de facto class of model solutions.<n>We propose an energy-constrained diffusion model, which integrates the inductive bias of diffusion with layer-wise constraints of energy.<n>Building on these insights, we devise a new class of message passing models, dubbed Transformers (DIFFormer), whose global attention layers are derived from the principled energy-constrained diffusion framework.
arXiv Detail & Related papers (2024-09-13T17:54:41Z) - Learning Divergence Fields for Shift-Robust Graph Representations [73.11818515795761]
In this work, we propose a geometric diffusion model with learnable divergence fields for the challenging problem with interdependent data.
We derive a new learning objective through causal inference, which can guide the model to learn generalizable patterns of interdependence that are insensitive across domains.
arXiv Detail & Related papers (2024-06-07T14:29:21Z) - What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding [67.59552859593985]
Graph Transformers, which incorporate self-attention and positional encoding, have emerged as a powerful architecture for various graph learning tasks.
This paper introduces first theoretical investigation of a shallow Graph Transformer for semi-supervised classification.
arXiv Detail & Related papers (2024-06-04T05:30:16Z) - Graph Neural Aggregation-diffusion with Metastability [4.040326569845733]
Continuous graph neural models based on differential equations have expanded the architecture of graph neural networks (GNNs)
We propose GRADE inspired by graph aggregation-diffusion equations, which includes the delicate balance between nonlinear diffusion and aggregation induced by interaction potentials.
We prove that GRADE achieves competitive performance across various benchmarks and alleviates the over-smoothing issue in GNNs.
arXiv Detail & Related papers (2024-03-29T15:05:57Z) - Revealing Decurve Flows for Generalized Graph Propagation [108.80758541147418]
This study addresses the limitations of the traditional analysis of message-passing, central to graph learning, by defining em textbfgeneralized propagation with directed and weighted graphs.
We include a preliminary exploration of learned propagation patterns in datasets, a first in the field.
arXiv Detail & Related papers (2024-02-13T14:13:17Z) - PGODE: Towards High-quality System Dynamics Modeling [40.76121531452706]
This paper studies the problem of modeling multi-agent dynamical systems, where agents could interact mutually to influence their behaviors.
Recent research predominantly uses geometric graphs to depict these mutual interactions, which are then captured by graph neural networks (GNNs)
We propose a new approach named Prototypical Graph ODE to address the problem.
arXiv Detail & Related papers (2023-11-11T12:04:47Z) - Gramian Angular Fields for leveraging pretrained computer vision models
with anomalous diffusion trajectories [0.9012198585960443]
We present a new data-driven method for working with diffusive trajectories.
This method utilizes Gramian Angular Fields (GAF) to encode one-dimensional trajectories as images.
We leverage two well-established pre-trained computer-vision models, ResNet and MobileNet, to characterize the underlying diffusive regime.
arXiv Detail & Related papers (2023-09-02T17:22:45Z) - SEGNO: Generalizing Equivariant Graph Neural Networks with Physical
Inductive Biases [66.61789780666727]
We show how the second-order continuity can be incorporated into GNNs while maintaining the equivariant property.
We also offer theoretical insights into SEGNO, highlighting that it can learn a unique trajectory between adjacent states.
Our model yields a significant improvement over the state-of-the-art baselines.
arXiv Detail & Related papers (2023-08-25T07:15:58Z) - DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained
Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states.
We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs.
Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z) - MentorGNN: Deriving Curriculum for Pre-Training GNNs [61.97574489259085]
We propose an end-to-end model named MentorGNN that aims to supervise the pre-training process of GNNs across graphs.
We shed new light on the problem of domain adaption on relational data (i.e., graphs) by deriving a natural and interpretable upper bound on the generalization error of the pre-trained GNNs.
arXiv Detail & Related papers (2022-08-21T15:12:08Z) - Learning Graph Structure from Convolutional Mixtures [119.45320143101381]
We propose a graph convolutional relationship between the observed and latent graphs, and formulate the graph learning task as a network inverse (deconvolution) problem.
In lieu of eigendecomposition-based spectral methods, we unroll and truncate proximal gradient iterations to arrive at a parameterized neural network architecture that we call a Graph Deconvolution Network (GDN)
GDNs can learn a distribution of graphs in a supervised fashion, perform link prediction or edge-weight regression tasks by adapting the loss function, and they are inherently inductive.
arXiv Detail & Related papers (2022-05-19T14:08:15Z) - Neural Sheaf Diffusion: A Topological Perspective on Heterophily and
Oversmoothing in GNNs [16.88394293874848]
We use cellular sheaf theory to show that the underlying geometry of the graph is deeply linked with the performance of GNNs.
By considering a hierarchy of increasingly general sheaves, we study how the ability of the sheaf diffusion process to achieve linear separation of the classes in the infinite time limit expands.
We prove that when the sheaf is non-trivial, discretised parametric diffusion processes have greater control than GNNs over their behaviour.
arXiv Detail & Related papers (2022-02-09T17:25:02Z) - Graph Kernel Neural Networks [53.91024360329517]
We propose to use graph kernels, i.e. kernel functions that compute an inner product on graphs, to extend the standard convolution operator to the graph domain.
This allows us to define an entirely structural model that does not require computing the embedding of the input graph.
Our architecture allows to plug-in any type of graph kernels and has the added benefit of providing some interpretability.
arXiv Detail & Related papers (2021-12-14T14:48:08Z) - Generalization of graph network inferences in higher-order graphical
models [5.33024001730262]
Probabilistic graphical models provide a powerful tool to describe complex statistical structure.
inferences such as marginalization are intractable for general graphs.
We define the Recurrent Factor Graph Neural Network (RF-GNN) to achieve fast approximate inference on graphical models that involve many-variable interactions.
arXiv Detail & Related papers (2021-07-12T20:51:27Z) - GRAND: Graph Neural Diffusion [15.00135729657076]
We present Graph Neural Diffusion (GRAND) that approaches deep learning on graphs as a continuous diffusion process.
In our model, the layer structure and topology correspond to the discretisation choices of temporal and spatial operators.
Key to the success of our models are stability with respect to perturbations in the data and this is addressed for both implicit and explicit discretisation schemes.
arXiv Detail & Related papers (2021-06-21T09:10:57Z) - Hyperbolic Variational Graph Neural Network for Modeling Dynamic Graphs [77.33781731432163]
We learn dynamic graph representation in hyperbolic space, for the first time, which aims to infer node representations.
We present a novel Hyperbolic Variational Graph Network, referred to as HVGNN.
In particular, to model the dynamics, we introduce a Temporal GNN (TGNN) based on a theoretically grounded time encoding approach.
arXiv Detail & Related papers (2021-04-06T01:44:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.