Generalizing GNNs with Tokenized Mixture of Experts
- URL: http://arxiv.org/abs/2602.09258v1
- Date: Mon, 09 Feb 2026 22:48:30 GMT
- Title: Generalizing GNNs with Tokenized Mixture of Experts
- Authors: Xiaoguang Guo, Zehong Wang, Jiazheng Li, Shawn Spitzel, Qi Yang, Kaize Ding, Jundong Li, Chuxu Zhang,
- Abstract summary: We show that improving stability requires reducing reliance on shift-sensitive features, leaving an irreducible worst-case generalization floor.<n>We propose STEM-GNN, a pretrain-then-finetune framework with a mixture-of-experts encoder for diverse computation paths.<n>Across nine node, link, and graph benchmarks, STEM-GNN achieves a stronger three-way balance, improving robustness to degree/homophily shifts and to feature/edge corruptions while remaining competitive on clean graphs.
- Score: 75.8310720413187
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deployed graph neural networks (GNNs) are frozen at deployment yet must fit clean data, generalize under distribution shifts, and remain stable to perturbations. We show that static inference induces a fundamental tradeoff: improving stability requires reducing reliance on shift-sensitive features, leaving an irreducible worst-case generalization floor. Instance-conditional routing can break this ceiling, but is fragile because shifts can mislead routing and perturbations can make routing fluctuate. We capture these effects via two decompositions separating coverage vs selection, and base sensitivity vs fluctuation amplification. Based on these insights, we propose STEM-GNN, a pretrain-then-finetune framework with a mixture-of-experts encoder for diverse computation paths, a vector-quantized token interface to stabilize encoder-to-head signals, and a Lipschitz-regularized head to bound output amplification. Across nine node, link, and graph benchmarks, STEM-GNN achieves a stronger three-way balance, improving robustness to degree/homophily shifts and to feature/edge corruptions while remaining competitive on clean graphs.
Related papers
- Silent Inconsistency in Data-Parallel Full Fine-Tuning: Diagnosing Worker-Level Optimization Misalignment [27.352639822596146]
Cross-worker divergence in losses and gradients can remain invisible under conventional monitoring signals.<n>We propose a model-agnostic diagnostic framework that quantifies worker-level consistency using training signals readily available in standard pipelines.
arXiv Detail & Related papers (2026-02-16T04:42:30Z) - Variational Bayesian Flow Network for Graph Generation [54.94088904387278]
We propose Variational Bayesian Flow Network (VBFN) for graph generation.<n>VBFN performs variational lifting to a tractable joint Gaussian variational belief family governed by structured precisions.<n>On synthetic and molecular graph datasets, VBFN improves fidelity and diversity, and surpasses baseline methods.
arXiv Detail & Related papers (2026-01-30T03:59:38Z) - Learning Wireless Interference Patterns: Decoupled GNN for Throughput Prediction in Heterogeneous Multi-Hop p-CSMA Networks [4.303580795892996]
Decoupled Graph Convolutional Network (D-GCN) is a novel architecture that explicitly separates processing of a node's own transmission probability from neighbor interference effects.<n>D-GCN attains 3.3% NMAE, outperforms strong baselines, remains tractable even when exact analytical methods become computationally infeasible.
arXiv Detail & Related papers (2025-10-15T22:13:59Z) - Sheaf Graph Neural Networks via PAC-Bayes Spectral Optimization [13.021238902084647]
Over-smoothing in Graph Neural Networks (GNNs) causes collapse in distinct node features.<n>We introduce SGPC (Sheaf GNNs with PAC-Bayes), a unified architecture that combines cellular-sheaf message passing with several mechanisms.<n> Experiments on nine homophilic and heterophilic benchmarks show that SGPC outperforms state-of-the-art spectral and sheaf-based GNNs.
arXiv Detail & Related papers (2025-08-01T06:39:28Z) - Towards Robust Spiking Neural Networks:Mitigating Heterogeneous Training Vulnerability via Dominant Eigencomponent Projection [21.5491519186604]
Spiking Neural Networks (SNNs) process information via discrete spikes, enabling them to operate at remarkably low energy levels.<n>Experiments reveal a striking vulnerability when SNNs are trained using the mainstream method--direct encoding combined with backpropagation through time.
arXiv Detail & Related papers (2025-05-16T11:29:49Z) - Exact Certification of (Graph) Neural Networks Against Label Poisoning [50.87615167799367]
We introduce an exact certification method for label flipping in Graph Neural Networks (GNNs)<n>We apply our method to certify a broad range of GNN architectures in node classification tasks.<n>Our work presents the first exact certificate to a poisoning attack ever derived for neural networks.
arXiv Detail & Related papers (2024-11-30T17:05:12Z) - Provable Robustness of (Graph) Neural Networks Against Data Poisoning and Backdoor Attacks [50.87615167799367]
We certify Graph Neural Networks (GNNs) against poisoning attacks, including backdoors, targeting the node features of a given graph.<n>Our framework provides fundamental insights into the role of graph structure and its connectivity on the worst-case behavior of convolution-based and PageRank-based GNNs.
arXiv Detail & Related papers (2024-07-15T16:12:51Z) - On the Trade-Off between Stability and Representational Capacity in
Graph Neural Networks [22.751509906413943]
We study the stability of EdgeNet: a general GNN framework that unifies more than twenty solutions.
By studying the effect of different EdgeNet categories on the stability, we show that GNNs with fewer degrees of freedom in their parameter space, linked to a lower representational capacity, are more stable.
arXiv Detail & Related papers (2023-12-04T22:07:17Z) - Stable and Transferable Hyper-Graph Neural Networks [95.07035704188984]
We introduce an architecture for processing signals supported on hypergraphs via graph neural networks (GNNs)
We provide a framework for bounding the stability and transferability error of GNNs across arbitrary graphs via spectral similarity.
arXiv Detail & Related papers (2022-11-11T23:44:20Z) - Generalizing Graph Neural Networks on Out-Of-Distribution Graphs [51.33152272781324]
Graph Neural Networks (GNNs) are proposed without considering the distribution shifts between training and testing graphs.
In such a setting, GNNs tend to exploit subtle statistical correlations existing in the training set for predictions, even though it is a spurious correlation.
We propose a general causal representation framework, called StableGNN, to eliminate the impact of spurious correlations.
arXiv Detail & Related papers (2021-11-20T18:57:18Z) - Improve Generalization and Robustness of Neural Networks via Weight
Scale Shifting Invariant Regularizations [52.493315075385325]
We show that a family of regularizers, including weight decay, is ineffective at penalizing the intrinsic norms of weights for networks with homogeneous activation functions.
We propose an improved regularizer that is invariant to weight scale shifting and thus effectively constrains the intrinsic norm of a neural network.
arXiv Detail & Related papers (2020-08-07T02:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.