Related papers: Phase-Aware Mixture of Experts for Agentic Reinforcement Learning

Phase-Aware Mixture of Experts for Agentic Reinforcement Learning

URL: http://arxiv.org/abs/2602.17038v1
Date: Thu, 19 Feb 2026 03:18:30 GMT
Title: Phase-Aware Mixture of Experts for Agentic Reinforcement Learning
Authors: Shengtian Yang, Yu Li, Shuo He, Yewen Li, Qingpeng Cai, Peng Jiang, Lei Feng,
Abstract summary: A plausible remedy could be employing the Mixture-of-Experts (MoE) architecture in the policy network.<n>MoE allows different parameters (experts) to specialize in different tasks, preventing simple tasks from dominating all parameters.<n>We propose textbfPhase-Aware Mixture of Experts (PA-MoE).<n>It first features a lightweight emphphase router that learns latent phase boundaries directly from the RL objective without pre-defining phase categories.<n>Then, the phase router allocates temporally consistent assignments to the same expert, allowing experts to preserve phase-specific expertise
Score: 23.18318273534301
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement learning (RL) has equipped LLM agents with a strong ability to solve complex tasks. However, existing RL methods normally use a \emph{single} policy network, causing \emph{simplicity bias} where simple tasks occupy most parameters and dominate gradient updates, leaving insufficient capacity for complex tasks. A plausible remedy could be employing the Mixture-of-Experts (MoE) architecture in the policy network, as MoE allows different parameters (experts) to specialize in different tasks, preventing simple tasks from dominating all parameters. However, a key limitation of traditional MoE is its token-level routing, where the router assigns each token to specialized experts, which fragments phase-consistent patterns into scattered expert assignments and thus undermines expert specialization. In this paper, we propose \textbf{Phase-Aware Mixture of Experts (PA-MoE)}. It first features a lightweight \emph{phase router} that learns latent phase boundaries directly from the RL objective without pre-defining phase categories. Then, the phase router allocates temporally consistent assignments to the same expert, allowing experts to preserve phase-specific expertise. Experimental results demonstrate the effectiveness of our proposed PA-MoE.

Related papers

SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning [83.66308307152808]
We propose StAbilized Mixture-of-Experts (SAME) for Multimodal Continual Instruction Tuning (MCIT)<n>SAME stabilizes expert selection by decomposing routing dynamics into subspaces and updating only task-relevant directions.<n>It also introduces adaptive expert activation to freeze selected experts during training, reducing redundant and cross-task interference.
arXiv Detail & Related papers (2026-02-02T11:47:06Z)
MoE Pathfinder: Trajectory-driven Expert Pruning [19.790092938955336]
We propose an expert pruning approach based on the trajectory of activated experts across layers.<n>Our approach achieves superior pruning performance on nearly all tasks compared with most existing approaches.
arXiv Detail & Related papers (2025-12-20T17:05:08Z)
Guided by the Experts: Provable Feature Learning Dynamic of Soft-Routed Mixture-of-Experts [11.437368205968573]
This paper advances MoE theory by providing convergence guarantees for joint training of soft-routed MoE models with non-linear routers and experts.<n>We show that a post-training pruning can effectively eliminate redundant neurons, followed by a provably convergent fine-tuning process that reaches global optimality.
arXiv Detail & Related papers (2025-10-08T16:40:31Z)
Adaptive Shared Experts with LoRA-Based Mixture of Experts for Multi-Task Learning [49.90176890917986]
Mixture-of-Experts (MoE) has emerged as a powerful framework for multi-task learning (MTL)<n>Existing MoE-MTL methods often rely on single-task pretrained backbones and suffer from redundant adaptation and inefficient knowledge sharing.<n>We propose adaptive shared experts (ASE) within a low-rank adaptation (LoRA) based MoE, where shared experts are assigned router-computed gating weights jointly normalized with sparse experts.
arXiv Detail & Related papers (2025-10-01T06:49:19Z)
One-Prompt Strikes Back: Sparse Mixture of Experts for Prompt-based Continual Learning [52.966712416640085]
We propose SMoPE, a novel framework that integrates the benefits of both task-specific and shared prompt strategies.<n>SMoPE consistently outperforms task-specific prompt methods and achieves performance competitive with state-of-the-art approaches.
arXiv Detail & Related papers (2025-09-29T08:54:58Z)
Token-Level Prompt Mixture with Parameter-Free Routing for Federated Domain Generalization [51.562474873972086]
Federated domain generalization (FedDG) aims to learn a globally generalizable model from decentralized clients with heterogeneous data.<n>Recent studies have introduced prompt learning to adapt vision-language models (VLMs) in FedDG by learning a single global prompt.<n>We propose TRIP, a Token-level prompt mixture with parameter-free routing framework for FedDG.
arXiv Detail & Related papers (2025-04-29T11:06:03Z)
Complexity Experts are Task-Discriminative Learners for Any Image Restoration [80.46313715427928]
We introduce complexity experts" -- flexible expert blocks with varying computational complexity and receptive fields.<n>This preference effectively drives task-specific allocation, assigning tasks to experts with the appropriate complexity.<n>The proposed MoCE-IR model outperforms state-of-the-art methods, affirming its efficiency and practical applicability.
arXiv Detail & Related papers (2024-11-27T15:58:07Z)
Glider: Global and Local Instruction-Driven Expert Router [83.785832410832]
"Model MoErging" methods prioritize generalization to unseen tasks at the expense of performance on held-in tasks. We propose Global and Local Instruction Driven Expert Router (GLIDER) that integrates a multi-scale routing mechanism. GLIDER achieves substantially improved held-in performance while maintaining strong generalization on held-out tasks.
arXiv Detail & Related papers (2024-10-09T17:59:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.