Related papers: Multi-Agent Guided Policy Optimization

Multi-Agent Guided Policy Optimization

URL: http://arxiv.org/abs/2507.18059v1
Date: Thu, 24 Jul 2025 03:22:21 GMT
Title: Multi-Agent Guided Policy Optimization
Authors: Yueheng Li, Guangming Xie, Zongqing Lu,
Abstract summary: Training with Decentralized Execution (CTDE) has become the dominant paradigm in cooperative Multi-Agent Reinforcement Learning (MARL)<n>We propose Multi-Agent Guided Policy Optimization (MAGPO), a novel framework that better leverages centralized training by integrating centralized guidance with decentralized execution.
Score: 36.853129816484845
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Due to practical constraints such as partial observability and limited communication, Centralized Training with Decentralized Execution (CTDE) has become the dominant paradigm in cooperative Multi-Agent Reinforcement Learning (MARL). However, existing CTDE methods often underutilize centralized training or lack theoretical guarantees. We propose Multi-Agent Guided Policy Optimization (MAGPO), a novel framework that better leverages centralized training by integrating centralized guidance with decentralized execution. MAGPO uses an auto-regressive joint policy for scalable, coordinated exploration and explicitly aligns it with decentralized policies to ensure deployability under partial observability. We provide theoretical guarantees of monotonic policy improvement and empirically evaluate MAGPO on 43 tasks across 6 diverse environments. Results show that MAGPO consistently outperforms strong CTDE baselines and matches or surpasses fully centralized approaches, offering a principled and practical solution for decentralized multi-agent learning. Our code and experimental data can be found in https://github.com/liyheng/MAGPO.

Related papers

Centralized Permutation Equivariant Policy for Cooperative Multi-Agent Reinforcement Learning [0.11650821883155184]
We propose Permutation Equivariant (CPE) learning, a centralized training and execution framework that employs a fully centralized policy to overcome limitations.<n>Our approach leverages a novel permutation equivariant architecture, Global-Local Permutation Equivariant (GLPE) networks, that is lightweight, scalable, and easy to implement.
arXiv Detail & Related papers (2025-08-13T22:10:37Z)
Imitation Learning based Alternative Multi-Agent Proximal Policy Optimization for Well-Formed Swarm-Oriented Pursuit Avoidance [15.498559530889839]
In this paper, we put forward a decentralized learning based Alternative Multi-Agent Proximal Policy Optimization (IA-MAPPO) algorithm to execute the pursuit avoidance task in well-formed swarm. We utilize imitation learning to decentralize the formation controller, so as to reduce the communication overheads and enhance the scalability. The simulation results validate the effectiveness of IA-MAPPO and extensive ablation experiments further show the performance comparable to a centralized solution with significant decrease in communication overheads.
arXiv Detail & Related papers (2023-11-06T06:58:16Z)
Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL? [34.00244359590573]
Training with Decentralized Execution is a popular framework for cooperative Multi-Agent Reinforcement Learning.<n>We introduce a novel Advising and Decentralized Pruning (CADP) framework for multi-agent reinforcement learning.
arXiv Detail & Related papers (2023-05-27T03:15:24Z)
Decentralized Policy Optimization [21.59254848913971]
We propose textitdecentralized policy optimization (DPO), a decentralized actor-critic algorithm with monotonic improvement and convergence guarantee. Empirically, we compare DPO with IPPO in a variety of cooperative multi-agent tasks, covering discrete and continuous action spaces, and fully and partially observable environments.
arXiv Detail & Related papers (2022-11-06T05:38:23Z)
More Centralized Training, Still Decentralized Execution: Multi-Agent Conditional Policy Factorization [21.10461189367695]
In cooperative multi-agent reinforcement learning (MARL), combining value decomposition with actor-critic enables agents learn policies. Agents are commonly assumed to be independent of each other, even in centralized training. We propose multi-agent conditional policy factorization (MACPF) which takes more centralized training but still enables decentralized execution.
arXiv Detail & Related papers (2022-09-26T13:29:22Z)
Monotonic Improvement Guarantees under Non-stationarity for Decentralized PPO [66.5384483339413]
We present a new monotonic improvement guarantee for optimizing decentralized policies in cooperative Multi-Agent Reinforcement Learning (MARL) We show that a trust region constraint can be effectively enforced in a principled way by bounding independent ratios based on the number of agents in training.
arXiv Detail & Related papers (2022-01-31T20:39:48Z)
Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge? [100.48692829396778]
Independent PPO (IPPO) is a form of independent learning in which each agent simply estimates its local value function. IPPO's strong performance may be due to its robustness to some forms of environment non-stationarity.
arXiv Detail & Related papers (2020-11-18T20:29:59Z)
F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications. We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting. Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z)
Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning [55.20040781688844]
QMIX is a novel value-based method that can train decentralised policies in a centralised end-to-end fashion. We propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning.
arXiv Detail & Related papers (2020-03-19T16:51:51Z)
FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC) It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces. We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.