Training Diverse Graph Experts for Ensembles: A Systematic Empirical Study
- URL: http://arxiv.org/abs/2510.18370v1
- Date: Tue, 21 Oct 2025 07:40:51 GMT
- Title: Training Diverse Graph Experts for Ensembles: A Systematic Empirical Study
- Authors: Gangda Deng, Yuxin Yang, Ömer Faruk Akgül, Hanqing Zeng, Yinglong Xia, Rajgopal Kannan, Viktor Prasanna,
- Abstract summary: We present the first systematic empirical study of expert-level diversification techniques for GNN ensembles.<n>We evaluate 20 diversification strategies across 14 node classification benchmarks.<n>Our comprehensive evaluation examines each technique in terms of expert diversity, complementarity, and ensemble performance.
- Score: 15.65200571307458
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Graph Neural Networks (GNNs) have become essential tools for learning on relational data, yet the performance of a single GNN is often limited by the heterogeneity present in real-world graphs. Recent advances in Mixture-of-Experts (MoE) frameworks demonstrate that assembling multiple, explicitly diverse GNNs with distinct generalization patterns can significantly improve performance. In this work, we present the first systematic empirical study of expert-level diversification techniques for GNN ensembles. Evaluating 20 diversification strategies -- including random re-initialization, hyperparameter tuning, architectural variation, directionality modeling, and training data partitioning -- across 14 node classification benchmarks, we construct and analyze over 200 ensemble variants. Our comprehensive evaluation examines each technique in terms of expert diversity, complementarity, and ensemble performance. We also uncovers mechanistic insights into training maximally diverse experts. These findings provide actionable guidance for expert training and the design of effective MoE frameworks on graph data. Our code is available at https://github.com/Hydrapse/bench-gnn-diversification.
Related papers
- pMoE: Prompting Diverse Experts Together Wins More in Visual Adaptation [68.3777121585281]
We propose a novel Mixture-of-Experts prompt tuning method called pMoE.<n>The proposed pMoE significantly enhances the model's versatility and applicability to a broad spectrum of tasks.<n>We conduct extensive experiments across 47 adaptation tasks, including both classification and segmentation in general and medical domains.
arXiv Detail & Related papers (2026-02-26T12:27:06Z) - Self-Adaptive Graph Mixture of Models [4.3009319001455975]
Self-Adaptive Graph Mixture of Models (SAGMM) is a modular and practical framework that learns to automatically select and combine the most appropriate GNN models.<n>We evaluate SAGMM on 16 benchmark datasets covering node classification, graph classification, regression, and link prediction tasks.
arXiv Detail & Related papers (2025-11-17T07:11:06Z) - GMoPE:A Prompt-Expert Mixture Framework for Graph Foundation Models [30.023472202549076]
Graph Neural Networks (GNNs) have demonstrated impressive performance on task-specific benchmarks, yet their ability to generalize across diverse domains and tasks remains limited.<n>We propose GMoPE, a framework that seamlessly integrates the Mixture-of-Experts (MoE) architecture with prompt-based learning for graphs.<n>We show that GMoPE consistently outperforms state-of-the-art baselines and achieves performance comparable to full parameter fine-tuning.
arXiv Detail & Related papers (2025-11-05T07:28:51Z) - Transformers Provably Learn Directed Acyclic Graphs via Kernel-Guided Mutual Information [91.66597637613263]
transformer-based models leveraging the attention mechanism have demonstrated strong empirical success in capturing complex dependencies within graphs.<n>We introduce a novel information-theoretic metric: the kernel-guided mutual information (KG-MI) based on the $f$-divergence.<n>We prove that, given sequences generated by a $K$-parent DAG, training a single-layer, multi-head transformer via a gradient ascent converges to the global optimum time.
arXiv Detail & Related papers (2025-10-29T14:07:12Z) - Adaptive Graph Mixture of Residual Experts: Unsupervised Learning on Diverse Graphs with Heterogeneous Specialization [17.89950704690598]
We introduce ADaMoRE, a principled framework that enables robust, fully unsupervised training of heterogeneous MoE on graphs.<n>A structurally-aware gating network performs fine-grained node routing.<n>Our design improves data efficiency and training stability.
arXiv Detail & Related papers (2025-10-24T07:18:24Z) - Exploring Graph-Transformer Out-of-Distribution Generalization Abilities [2.4063592468412276]
Graph-transformer (GT) backbones have recently outperformed traditional message-passing neural networks (MPNNs) in multiple in-distribution (ID) benchmarks.<n>In this work, we address the challenge of out-of-distribution (OOD) generalization for graph neural networks.<n>We adapt several leading domain generalization (DG) algorithms to work with GTs and assess their performance on a benchmark designed to test a variety of distribution shifts.
arXiv Detail & Related papers (2025-06-25T16:09:24Z) - DA-MoE: Addressing Depth-Sensitivity in Graph-Level Analysis through Mixture of Experts [70.21017141742763]
Graph neural networks (GNNs) are gaining popularity for processing graph-structured data.
Existing methods generally use a fixed number of GNN layers to generate representations for all graphs.
We propose the depth adaptive mixture of expert (DA-MoE) method, which incorporates two main improvements to GNN.
arXiv Detail & Related papers (2024-11-05T11:46:27Z) - Enhancing GNNs with Architecture-Agnostic Graph Transformations: A Systematic Analysis [0.4069144210024563]
This study explores the impact of various graph transformations as pre-processing steps on the performance of common graph neural network (GNN) architectures across standard datasets.
Our findings reveal that certain transformations, particularly those augmenting node features with centrality measures, consistently improve expressivity.
However, these gains come with trade-offs, as methods like graph encoding, while enhancing expressivity, introduce numerical inaccuracies widely-used python packages.
arXiv Detail & Related papers (2024-10-11T12:19:17Z) - Learning Invariant Representations of Graph Neural Networks via Cluster
Generalization [58.68231635082891]
Graph neural networks (GNNs) have become increasingly popular in modeling graph-structured data.
In this paper, we experimentally find that the performance of GNNs drops significantly when the structure shift happens.
We propose the Cluster Information Transfer (CIT) mechanism, which can learn invariant representations for GNNs.
arXiv Detail & Related papers (2024-03-06T10:36:56Z) - GraphMETRO: Mitigating Complex Graph Distribution Shifts via Mixture of Aligned Experts [75.51612253852002]
GraphMETRO is a Graph Neural Network architecture that models natural diversity and captures complex distributional shifts.
GraphMETRO achieves state-of-the-art results on four datasets from the GOOD benchmark.
arXiv Detail & Related papers (2023-12-07T20:56:07Z) - G-Mix: A Generalized Mixup Learning Framework Towards Flat Minima [17.473268736086137]
We propose a new learning framework called Generalized-Mixup, which combines the strengths of Mixup and SAM for training DNN models.
We introduce two novel algorithms: Binary G-Mix and Decomposed G-Mix, which partition the training data into two subsets based on the sharpness-sensitivity of each example.
Both theoretical explanations and experimental results reveal that the proposed BG-Mix and DG-Mix algorithms further enhance model generalization across multiple datasets and models.
arXiv Detail & Related papers (2023-08-07T01:25:10Z) - Graph Mixture of Experts: Learning on Large-Scale Graphs with Explicit
Diversity Modeling [60.0185734837814]
Graph neural networks (GNNs) have found extensive applications in learning from graph data.
To bolster the generalization capacity of GNNs, it has become customary to augment training graph structures with techniques like graph augmentations.
This study introduces the concept of Mixture-of-Experts (MoE) to GNNs, with the aim of augmenting their capacity to adapt to a diverse range of training graph structures.
arXiv Detail & Related papers (2023-04-06T01:09:36Z) - Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive
Benchmark Study [100.27567794045045]
Training deep graph neural networks (GNNs) is notoriously hard.
We present the first fair and reproducible benchmark dedicated to assessing the "tricks" of training deep GNNs.
arXiv Detail & Related papers (2021-08-24T05:00:37Z) - Principal Neighbourhood Aggregation for Graph Nets [4.339839287869653]
Graph Neural Networks (GNNs) have been shown to be effective models for different predictive tasks on graph-structured data.
Recent work on their expressive power has focused on isomorphism tasks and countable feature spaces.
We extend this theoretical framework to include continuous features which occur regularly in real-world input domains.
arXiv Detail & Related papers (2020-04-12T23:30:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.