Related papers: Adaptive Graph Mixture of Residual Experts: Unsupervised Learning on Diverse Graphs with Heterogeneous Specialization

Adaptive Graph Mixture of Residual Experts: Unsupervised Learning on Diverse Graphs with Heterogeneous Specialization

URL: http://arxiv.org/abs/2510.21207v1
Date: Fri, 24 Oct 2025 07:18:24 GMT
Title: Adaptive Graph Mixture of Residual Experts: Unsupervised Learning on Diverse Graphs with Heterogeneous Specialization
Authors: Yunlong Chu, Minglai Shao, Zengyi Wo, Bing Hao, Yuhang Liu, Ruijie Wang, Jianxin Li,
Abstract summary: We introduce ADaMoRE, a principled framework that enables robust, fully unsupervised training of heterogeneous MoE on graphs.<n>A structurally-aware gating network performs fine-grained node routing.<n>Our design improves data efficiency and training stability.
Score: 17.89950704690598
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Graph Neural Networks (GNNs) face a fundamental adaptability challenge: their fixed message-passing architectures struggle with the immense diversity of real-world graphs, where optimal computational strategies vary by local structure and task. While Mixture-of-Experts (MoE) offers a promising pathway to adaptability, existing graph MoE methods remain constrained by their reliance on supervised signals and instability when training heterogeneous experts. We introduce ADaMoRE (Adaptive Mixture of Residual Experts), a principled framework that enables robust, fully unsupervised training of heterogeneous MoE on graphs. ADaMoRE employs a backbone-residual expert architecture where foundational encoders provide stability while specialized residual experts capture diverse computational patterns. A structurally-aware gating network performs fine-grained node routing. The entire architecture is trained end-to-end using a unified unsupervised objective, which integrates a primary reconstruction task with an information-theoretic diversity regularizer to explicitly enforce functional specialization among the experts. Theoretical analysis confirms our design improves data efficiency and training stability. Extensive evaluation across 16 benchmarks validates ADaMoRE's state-of-the-art performance in unsupervised node classification and few-shot learning, alongside superior generalization, training efficiency, and faster convergence on diverse graphs and tasks.

Related papers

Evolutionary Router Feature Generation for Zero-Shot Graph Anomaly Detection with Mixture-of-Experts [60.60414602796664]
We propose a novel MoE framework with evolutionary router feature generation (EvoFG) for zero-shot GAD.<n>EvoFG consistently outperforms state-of-the-art baselines, achieving strong and stable zero-shot GAD performance.
arXiv Detail & Related papers (2026-02-12T06:16:51Z)
OFA-MAS: One-for-All Multi-Agent System Topology Design based on Mixture-of-Experts Graph Generative Models [57.94189874119267]
Multi-Agent Systems (MAS) offer a powerful paradigm for solving complex problems.<n>Current graph learning-based design methodologies often adhere to a "one-for-one" paradigm.<n>We propose OFA-TAD, a one-for-all framework that generates adaptive collaboration graphs for any task described in natural language.
arXiv Detail & Related papers (2026-01-19T12:23:44Z)
Mixture of Ranks with Degradation-Aware Routing for One-Step Real-World Image Super-Resolution [76.66229730098759]
In real-world image super-resolution (Real-ISR), existing approaches mainly rely on fine-tuning pre-trained diffusion models.<n>We propose a Mixture-of-Ranks (MoR) architecture for single-step image super-resolution.<n>We introduce a fine-grained expert partitioning strategy that treats each rank in LoRA as an independent expert.
arXiv Detail & Related papers (2025-11-20T04:11:44Z)
Self-Adaptive Graph Mixture of Models [4.3009319001455975]
Self-Adaptive Graph Mixture of Models (SAGMM) is a modular and practical framework that learns to automatically select and combine the most appropriate GNN models.<n>We evaluate SAGMM on 16 benchmark datasets covering node classification, graph classification, regression, and link prediction tasks.
arXiv Detail & Related papers (2025-11-17T07:11:06Z)
GMoPE:A Prompt-Expert Mixture Framework for Graph Foundation Models [30.023472202549076]
Graph Neural Networks (GNNs) have demonstrated impressive performance on task-specific benchmarks, yet their ability to generalize across diverse domains and tasks remains limited.<n>We propose GMoPE, a framework that seamlessly integrates the Mixture-of-Experts (MoE) architecture with prompt-based learning for graphs.<n>We show that GMoPE consistently outperforms state-of-the-art baselines and achieves performance comparable to full parameter fine-tuning.
arXiv Detail & Related papers (2025-11-05T07:28:51Z)
Transformers Provably Learn Directed Acyclic Graphs via Kernel-Guided Mutual Information [91.66597637613263]
transformer-based models leveraging the attention mechanism have demonstrated strong empirical success in capturing complex dependencies within graphs.<n>We introduce a novel information-theoretic metric: the kernel-guided mutual information (KG-MI) based on the $f$-divergence.<n>We prove that, given sequences generated by a $K$-parent DAG, training a single-layer, multi-head transformer via a gradient ascent converges to the global optimum time.
arXiv Detail & Related papers (2025-10-29T14:07:12Z)
Training Diverse Graph Experts for Ensembles: A Systematic Empirical Study [15.65200571307458]
We present the first systematic empirical study of expert-level diversification techniques for GNN ensembles.<n>We evaluate 20 diversification strategies across 14 node classification benchmarks.<n>Our comprehensive evaluation examines each technique in terms of expert diversity, complementarity, and ensemble performance.
arXiv Detail & Related papers (2025-10-21T07:40:51Z)
Guided by the Experts: Provable Feature Learning Dynamic of Soft-Routed Mixture-of-Experts [11.437368205968573]
This paper advances MoE theory by providing convergence guarantees for joint training of soft-routed MoE models with non-linear routers and experts.<n>We show that a post-training pruning can effectively eliminate redundant neurons, followed by a provably convergent fine-tuning process that reaches global optimality.
arXiv Detail & Related papers (2025-10-08T16:40:31Z)
Graph Foundation Models: Bridging Language Model Paradigms and Graph Optimization [4.502753947356616]
We introduce the Graph Foundation Model (GFM), the first framework capable of solving all distance-based optimization problems on graph structures.<n>GFM internalizes the graph's complex topological and neural rules, where the connectivity of the structure itself can be treated as the supervisory signal.<n>Our work establishes a new paradigm of adapting the pretrain-transfer framework to graph optimization, opening the door for applying foundation model innovations to Operations Research.
arXiv Detail & Related papers (2025-09-29T04:05:48Z)
SynthGenNet: a self-supervised approach for test-time generalization using synthetic multi-source domain mixing of street view images [8.23277995673829]
We introduce SynthGenNet, a self-supervised student-teacher architecture to enable robust test-time domain generalization.<n>Our contributions include the novel ClassMix++ algorithm, which blends labeled data from various synthetic sources.<n>We show our model outperforms the state-of-the-art (relying on single source) by achieving 50% Mean Intersection-Over-Union (mIoU) value on real-world datasets.
arXiv Detail & Related papers (2025-09-02T13:08:03Z)
Graph Structure Refinement with Energy-based Contrastive Learning [56.957793274727514]
We introduce an unsupervised method based on a joint of generative training and discriminative training to learn graph structure and representation.<n>We propose an Energy-based Contrastive Learning (ECL) guided Graph Structure Refinement (GSR) framework, denoted as ECL-GSR.<n>ECL-GSR achieves faster training with fewer samples and memories against the leading baseline, highlighting its simplicity and efficiency in downstream tasks.
arXiv Detail & Related papers (2024-12-20T04:05:09Z)
Matcha: Mitigating Graph Structure Shifts with Test-Time Adaptation [66.40525136929398]
Test-time adaptation (TTA) has attracted attention due to its ability to adapt a pre-trained model to a target domain, without re-accessing the source domain.<n>We propose Matcha, an innovative framework designed for effective and efficient adaptation to structure shifts in graphs.<n>We validate the effectiveness of Matcha on both synthetic and real-world datasets, demonstrating its robustness across various combinations of structure and attribute shifts.
arXiv Detail & Related papers (2024-10-09T15:15:40Z)
Personalizing Federated Learning with Over-the-Air Computations [84.8089761800994]
Federated edge learning is a promising technology to deploy intelligence at the edge of wireless networks in a privacy-preserving manner. Under such a setting, multiple clients collaboratively train a global generic model under the coordination of an edge server. This paper presents a distributed training paradigm that employs analog over-the-air computation to address the communication bottleneck.
arXiv Detail & Related papers (2023-02-24T08:41:19Z)
An Optimization-Based Meta-Learning Model for MRI Reconstruction with Diverse Dataset [4.9259403018534496]
We develop a generalizable MRI reconstruction model in the meta-learning framework. The proposed network learns regularization function in a learner adaptional model. We test the result of quick training on the unseen tasks after meta-training and in the saving half of the time.
arXiv Detail & Related papers (2021-10-02T03:21:52Z)
Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments. We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data. Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.