Related papers: Mixture-of-Experts Meets In-Context Reinforcement Learning

Mixture-of-Experts Meets In-Context Reinforcement Learning

URL: http://arxiv.org/abs/2506.05426v3
Date: Tue, 28 Oct 2025 06:55:14 GMT
Title: Mixture-of-Experts Meets In-Context Reinforcement Learning
Authors: Wenhao Wu, Fuhong Liu, Haoru Li, Zican Hu, Daoyi Dong, Chunlin Chen, Zhi Wang,
Abstract summary: In-context reinforcement learning (ICRL) has emerged as a promising paradigm for adapting RL agents to downstream tasks.<n>We propose T2MIR, an innovative framework that introduces architectural advances of mixture-of-experts (MoE) into transformer-based decision models.<n>We show that T2MIR significantly facilitates in-context learning capacity and outperforms various types of baselines.
Score: 49.19791753312034
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In-context reinforcement learning (ICRL) has emerged as a promising paradigm for adapting RL agents to downstream tasks through prompt conditioning. However, two notable challenges remain in fully harnessing in-context learning within RL domains: the intrinsic multi-modality of the state-action-reward data and the diverse, heterogeneous nature of decision tasks. To tackle these challenges, we propose T2MIR (Token- and Task-wise MoE for In-context RL), an innovative framework that introduces architectural advances of mixture-of-experts (MoE) into transformer-based decision models. T2MIR substitutes the feedforward layer with two parallel layers: a token-wise MoE that captures distinct semantics of input tokens across multiple modalities, and a task-wise MoE that routes diverse tasks to specialized experts for managing a broad task distribution with alleviated gradient conflicts. To enhance task-wise routing, we introduce a contrastive learning method that maximizes the mutual information between the task and its router representation, enabling more precise capture of task-relevant information. The outputs of two MoE components are concatenated and fed into the next layer. Comprehensive experiments show that T2MIR significantly facilitates in-context learning capacity and outperforms various types of baselines. We bring the potential and promise of MoE to ICRL, offering a simple and scalable architectural enhancement to advance ICRL one step closer toward achievements in language and vision communities. Our code is available at https://github.com/NJU-RL/T2MIR.

Related papers

Embed-RL: Reinforcement Learning for Reasoning-Driven Multimodal Embeddings [44.77164359074224]
Multimodal Large Language Models (MLLMs) have become pivotal for advancing Universal Multimodal Embeddings (UME)<n>Recent studies demonstrate that incorporating generative Chain-of-Thought (CoT) reasoning can substantially enhance task-specific representations.<n>We propose a reasoning-driven UME framework that integrates Embedder-Guided Reinforcement Learning (EG-RL) to optimize the Reasoner to produce evidential Traceability CoT.
arXiv Detail & Related papers (2026-02-14T15:35:03Z)
From Sparse Decisions to Dense Reasoning: A Multi-attribute Trajectory Paradigm for Multimodal Moderation [59.27094165576015]
We propose a novel learning paradigm (UniMod) that transitions from sparse decision-making to dense reasoning traces.<n>By constructing structured trajectories encompassing evidence grounding, modality assessment, risk mapping, policy decision, and response generation, we reformulate monolithic decision tasks into a multi-dimensional boundary learning process.<n>We introduce specialized optimization strategies to decouple task-specific parameters and rebalance training dynamics, effectively resolving interference between diverse objectives in multi-task learning.
arXiv Detail & Related papers (2026-01-28T09:29:40Z)
MoIIE: Mixture of Intra- and Inter-Modality Experts for Large Vision Language Models [52.876185634349575]
We propose to incorporate Mixture of Intra- and Inter-Modality Experts (MoIIE) to Large Vision-Language Models (LVLMs)<n>For each token, expert routing is guided by its modality, directing tokens to their respective intra-modality experts as well as a shared pool of inter-modality experts.<n>Our MoIIE models with 5.5B and 11.3B activated parameters match or even surpass the performance of existing advanced open-source MoE-LLMs based multi-modal models.
arXiv Detail & Related papers (2025-08-13T13:00:05Z)
DETACH: Cross-domain Learning for Long-Horizon Tasks via Mixture of Disentangled Experts [6.15749307717446]
DETACH is a cross-domain learning framework for LH tasks via biologically inspired dual-stream disentanglement.<n>It can achieve an average subtasks success rate improvement of 23% and average execution efficiency improvement of 29%.
arXiv Detail & Related papers (2025-08-11T10:54:28Z)
Separation and Collaboration: Two-Level Routing Grouped Mixture-of-Experts for Multi-Domain Continual Learning [7.361665112773847]
We propose a Two-Level Grouped Mixture Routing-of-Experts (TRGE) method to mitigate catastrophic forgetting.<n> TRGE dynamically expands the pre-trained CLIP model, assigning specific expert group for each task.<n>We leverage Multimodal Large Language Models (MLLMs) which own powerful multimodal comprehension capabilities to generate task descriptions and recognize the correct task identifier.
arXiv Detail & Related papers (2025-08-11T08:18:22Z)
MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings [75.0617088717528]
MoCa is a framework for transforming pre-trained VLM backbones into effective bidirectional embedding models.<n>MoCa consistently improves performance across MMEB and ViDoRe-v2 benchmarks, achieving new state-of-the-art results.
arXiv Detail & Related papers (2025-06-29T06:41:00Z)
Dynamic Mixture of Curriculum LoRA Experts for Continual Multimodal Instruction Tuning [45.019751165506946]
Continual multimodal instruction tuning is crucial for adapting Multimodal Large Language Models (MLLMs) to evolving tasks.<n>We propose a novel Dynamic Mixture of Curriculum LoRA Experts (D-MoLE) method, which automatically evolves MLLM's architecture with controlled parameter budgets to continually adapt to new tasks.<n>Specifically, we propose a dynamic layer-wise expert allocator, which automatically allocates LoRA experts across layers to resolve architecture conflicts.<n>Then, we propose a gradient-based inter-modal continual curriculum, which adjusts the update ratio of each module in MLLM based on the difficulty of each
arXiv Detail & Related papers (2025-06-13T11:03:46Z)
Resolving Task Objective Conflicts in Unified Multimodal Understanding and Generation via Task-Aware Mixture-of-Experts [11.307588007047407]
multimodal large language models (MLLMs) integrate both understanding and generation tasks within a single framework.<n> intrinsic Task Objective Conflicts between high-level semantic abstraction in understanding and fine-grained detail preservation in generation pose significant challenges.<n>We propose a novel approach that decouples internal components of AR to resolve task objective conflicts.
arXiv Detail & Related papers (2025-06-04T05:44:21Z)
M2IV: Towards Efficient and Fine-grained Multimodal In-Context Learning in Large Vision-Language Models [11.542439154523647]
We propose textbfM2IV, a method that substitutes explicit demonstrations with learnable textbfVectors directly integrated into LVLMs.<n>M2IV achieves robust cross-modal fidelity and fine-grained semantic distillation through training.<n>Experiments show that M2IV surpasses Vanilla ICL and prior representation engineering approaches.
arXiv Detail & Related papers (2025-04-06T22:02:21Z)
LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant [63.28378110792787]
We introduce LamRA, a versatile framework designed to empower Large Multimodal Models with sophisticated retrieval and reranking capabilities.<n>For retrieval, we adopt a two-stage training strategy comprising language-only pre-training and multimodal instruction tuning.<n>For reranking, we employ joint training for both pointwise and listwise reranking, offering two distinct ways to further boost the retrieval performance.
arXiv Detail & Related papers (2024-12-02T17:10:16Z)
Few-Shot Joint Multimodal Entity-Relation Extraction via Knowledge-Enhanced Cross-modal Prompt Model [16.03304915788997]
Joint Multimodal Entity-Relation Extraction (JMERE) is a challenging task that aims to extract entities and their relations from text-image pairs in social media posts.<n>Existing methods for JMERE require large amounts of labeled data.<n>We introduce the textbfKnowledge-textbfEnhanced textbfCross-modal textbfPrompt textbfModel.
arXiv Detail & Related papers (2024-10-18T07:14:54Z)
ModalPrompt: Towards Efficient Multimodal Continual Instruction Tuning with Dual-Modality Guided Prompt [51.71932333475573]
Large Multimodal Models (LMMs) exhibit remarkable multi-tasking ability by learning mixed instruction datasets.<n>Existing MCIT methods do not fully exploit the unique attribute of LMMs.<n>We propose a novel prompt learning framework for MCIT to effectively alleviate forgetting of previous knowledge.
arXiv Detail & Related papers (2024-10-08T09:35:37Z)
M3-Jepa: Multimodal Alignment via Multi-directional MoE based on the JEPA framework [6.928469290518152]
M3-Jepa is a scalable multimodal alignment framework with a predictor implemented by a multi-directional mixture of experts.<n>We show that M3-Jepa can obtain state-of-the-art performance on different modalities and tasks, generalize to unseen datasets and domains, and is computationally efficient in training and inference.
arXiv Detail & Related papers (2024-09-09T10:40:50Z)
NoteLLM-2: Multimodal Large Representation Models for Recommendation [71.87790090964734]
Large Language Models (LLMs) have demonstrated exceptional proficiency in text understanding and embedding tasks.<n>Their potential in multimodal representation, particularly for item-to-item (I2I) recommendations, remains underexplored.<n>We propose an end-to-end fine-tuning method that customizes the integration of any existing LLMs and vision encoders for efficient multimodal representation.
arXiv Detail & Related papers (2024-05-27T03:24:01Z)
T-REX: Mixture-of-Rank-One-Experts with Semantic-aware Intuition for Multi-task Large Language Model Finetuning [31.276142111455847]
Large language models (LLMs) encounter significant adaptation challenges in diverse multitask finetuning.<n>We design a novel framework, mixunderlinetextbfTureunderlinetextbf-of-underlinetextbfRank-onunderlinetextbfE-eunderlinetextbfXper ts (textttT-REX)<n>Rank-1 experts enable a mix-and-match mechanism to quadratically expand the vector subspace of experts with linear parameter overheads, achieving approximate error reduction with optimal
arXiv Detail & Related papers (2024-04-13T12:14:58Z)
Multi-Task Reinforcement Learning with Mixture of Orthogonal Experts [20.926613438442782]
Multi-Task Reinforcement Learning (MTRL) tackles the problem of endowing agents with skills that generalize across a variety of problems. To this end, sharing representations plays a fundamental role in capturing both unique and common characteristics of the tasks. We introduce a novel approach for representation learning in MTRL that encapsulates common structures among the tasks using representations to promote diversity.
arXiv Detail & Related papers (2023-11-19T18:09:25Z)
Dual Semantic Knowledge Composed Multimodal Dialog Systems [114.52730430047589]
We propose a novel multimodal task-oriented dialog system named MDS-S2. It acquires the context related attribute and relation knowledge from the knowledge base. We also devise a set of latent query variables to distill the semantic information from the composed response representation.
arXiv Detail & Related papers (2023-05-17T06:33:26Z)
HiNet: Novel Multi-Scenario & Multi-Task Learning with Hierarchical Information Extraction [50.40732146978222]
Multi-scenario & multi-task learning has been widely applied to many recommendation systems in industrial applications. We propose a Hierarchical information extraction Network (HiNet) for multi-scenario and multi-task recommendation. HiNet achieves a new state-of-the-art performance and significantly outperforms existing solutions.
arXiv Detail & Related papers (2023-03-10T17:24:41Z)
Reparameterizing Convolutions for Incremental Multi-Task Learning without Task Interference [75.95287293847697]
Two common challenges in developing multi-task models are often overlooked in literature. First, enabling the model to be inherently incremental, continuously incorporating information from new tasks without forgetting the previously learned ones (incremental learning) Second, eliminating adverse interactions amongst tasks, which has been shown to significantly degrade the single-task performance in a multi-task setup (task interference)
arXiv Detail & Related papers (2020-07-24T14:44:46Z)
Meta Reinforcement Learning with Autonomous Inference of Subtask Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph. Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference. Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.