LLM-Driven Stationarity-Aware Expert Demonstrations for Multi-Agent Reinforcement Learning in Mobile Systems
- URL: http://arxiv.org/abs/2511.19368v1
- Date: Mon, 24 Nov 2025 18:03:59 GMT
- Title: LLM-Driven Stationarity-Aware Expert Demonstrations for Multi-Agent Reinforcement Learning in Mobile Systems
- Authors: Tianyang Duan, Zongyuan Zhang, Zheng Lin, Songxiao Guo, Xiuxian Guan, Guangyu Wu, Zihan Fang, Haotian Meng, Xia Du, Ji-Zhe Zhou, Heming Cui, Jun Luo, Yue Gao,
- Abstract summary: Multi-agent reinforcement learning (MARL) has been increasingly adopted in many real-world applications.<n>MARL suffers from severe non-stationarity due to the synchronous updates of agent policies.<n>We propose RELED, a scalable MARL framework that integrates large language model (LLM)-driven expert demonstrations with autonomous agent exploration.
- Score: 24.923288853813073
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-agent reinforcement learning (MARL) has been increasingly adopted in many real-world applications. While MARL enables decentralized deployment on resource-constrained edge devices, it suffers from severe non-stationarity due to the synchronous updates of agent policies. This non stationarity results in unstable training and poor policy con vergence, especially as the number of agents increases. In this paper, we propose RELED, a scalable MARL framework that integrates large language model (LLM)-driven expert demonstrations with autonomous agent exploration. RELED incorporates a Stationarity-Aware Expert Demonstration module, which leverages theoretical non-stationarity bounds to enhance the quality of LLM-generated expert trajectories, thus providing high reward and training-stable samples for each agent. Moreover, a Hybrid Expert-Agent Policy Optimization module adaptively balances each agent's learning from both expert-generated and agent-generated trajectories, accelerating policy convergence and improving generalization. Extensive experiments with real city networks based on OpenStreetMap demonstrate that RELED achieves superior performance compared to state-of-the-art MARL methods.
Related papers
- MARTI-MARS$^2$: Scaling Multi-Agent Self-Search via Reinforcement Learning for Code Generation [64.2621682259008]
Multi-Agent Reinforced Training and Inference Framework with Self-Search Scaling (MARTI-MARS2)<n>We propose a Multi-Agent Reinforced Training and Inference Framework with Self-Search Scaling (MARTI-MARS2) to integrate policy learning with multi-agent tree search.<n>We show that MARTI-MARS2 achieves 77.7%, outperforming strong baselines like GPT-5.1 on challenging code generation benchmarks.
arXiv Detail & Related papers (2026-02-08T07:28:44Z) - Agentic Reinforcement Learning with Implicit Step Rewards [92.26560379363492]
Large language models (LLMs) are increasingly developed as autonomous agents using reinforcement learning (agentic RL)<n>We introduce implicit step rewards for agentic RL (iStar), a general credit-assignment strategy that integrates seamlessly with standard RL algorithms.<n>We evaluate our method on three challenging agent benchmarks, including WebShop and VisualSokoban, as well as open-ended social interactions with unverifiable rewards in SOTOPIA.
arXiv Detail & Related papers (2025-09-23T16:15:42Z) - LEED: A Highly Efficient and Scalable LLM-Empowered Expert Demonstrations Framework for Multi-Agent Reinforcement Learning [17.656443715585343]
Multi-agent reinforcement learning (MARL) holds substantial promise for intelligent decision-making in complex environments.<n>We propose the LLM-empowered expert demonstrations framework for multi-agent reinforcement learning (LEED)<n>LEED consists of two components: a demonstration generation (DG) module and a policy optimization (PO) module.<n> Experimental results show that LEED achieves superior sample efficiency, time efficiency, and robust scalability compared to state-of-the-art baselines.
arXiv Detail & Related papers (2025-09-18T07:19:24Z) - The Landscape of Agentic Reinforcement Learning for LLMs: A Survey [103.32591749156416]
The emergence of agentic reinforcement learning (Agentic RL) marks a paradigm shift from conventional reinforcement learning applied to large language models (LLM RL)<n>This survey formalizes this conceptual shift by contrasting the degenerate single-step Markov Decision Processes (MDPs) of LLM-RL with the temporally extended, partially observable Markov decision processes (POMDPs) that define Agentic RL.
arXiv Detail & Related papers (2025-09-02T17:46:26Z) - Agentic Reinforced Policy Optimization [66.96989268893932]
Large-scale reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in harnessing the potential of large language models (LLMs) for single-turn reasoning tasks.<n>Current RL algorithms inadequately balance the models' intrinsic long-horizon reasoning capabilities and their proficiency in multi-turn tool interactions.<n>We propose Agentic Reinforced Policy Optimization (ARPO), a novel agentic RL algorithm tailored for training multi-turn LLM-based agents.
arXiv Detail & Related papers (2025-07-26T07:53:11Z) - How to Train a Leader: Hierarchical Reasoning in Multi-Agent LLMs [16.853362180877593]
We introduce a hierarchical multi-agent framework that trains only a single leader LLM to coordinate a team of untrained peer agents.<n>Our results highlight the effectiveness and efficiency of training a single, flexible leader for collaborative reasoning in multi-agent LLM systems.
arXiv Detail & Related papers (2025-07-11T18:34:07Z) - Heterogeneous Group-Based Reinforcement Learning for LLM-based Multi-Agent Systems [25.882461853973897]
We propose Multi-Agent Heterogeneous Group Policy Optimization (MHGPO), which guides policy updates by estimating relative reward advantages.<n>MHGPO eliminates the need for Critic networks, enhancing stability and reducing computational overhead.<n>We also introduce three group rollout sampling strategies that trade off between efficiency and effectiveness.
arXiv Detail & Related papers (2025-06-03T10:17:19Z) - RLAE: Reinforcement Learning-Assisted Ensemble for LLMs [21.77261258691006]
Large language models (LLMs) can effectively combine diverse strengths of different models, offering a promising approach to enhance performance across various tasks.<n>We propose Reinforcement Learning-Assisted Ensemble for LLMs, a novel framework that reformulates ensemble through the lens of a Markov Decision Process (MDP)<n>Our approach introduces a RL agent that dynamically adjusts ensemble weights by considering both input context and intermediate generation states.
arXiv Detail & Related papers (2025-05-31T07:38:41Z) - Training LLM-Based Agents with Synthetic Self-Reflected Trajectories and Partial Masking [61.61356842567952]
We propose STeP, a novel method for improving LLM-based agent training.<n>We synthesize self-reflected trajectories that include reflections and corrections of error steps.<n>Experiments demonstrate that our method improves agent performance across three representative tasks.
arXiv Detail & Related papers (2025-05-26T14:11:12Z) - Improving Retrospective Language Agents via Joint Policy Gradient Optimization [57.35348425288859]
RetroAct is a framework that jointly optimize both task-planning and self-reflective evolution capabilities in language agents.<n>We develop a two-stage joint optimization process that integrates imitation learning and reinforcement learning.<n>We conduct extensive experiments across various testing environments, demonstrating RetroAct has substantial improvements in task performance and decision-making processes.
arXiv Detail & Related papers (2025-03-03T12:54:54Z) - From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.<n>We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.