Diffusing to Coordinate: Efficient Online Multi-Agent Diffusion Policies
- URL: http://arxiv.org/abs/2602.18291v1
- Date: Fri, 20 Feb 2026 15:38:02 GMT
- Title: Diffusing to Coordinate: Efficient Online Multi-Agent Diffusion Policies
- Authors: Zhuoran Li, Hai Zhong, Xun Wang, Qingxin Xia, Lihua Zhang, Longbo Huang,
- Abstract summary: Diffusion-based generative models are well-positioned to meet the needs of online Multi-Agent Reinforcement Learning (MARL)<n>We propose among the first underlineOnline off-policy underlineMARL framework using underlineDiffusion policies to orchestrate coordination.<n>Our key innovation is a relaxed policy objective that maximizes scaled joint entropy, facilitating effective exploration without relying on tractable likelihood.
- Score: 51.24079409973799
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Online Multi-Agent Reinforcement Learning (MARL) is a prominent framework for efficient agent coordination. Crucially, enhancing policy expressiveness is pivotal for achieving superior performance. Diffusion-based generative models are well-positioned to meet this demand, having demonstrated remarkable expressiveness and multimodal representation in image generation and offline settings. Yet, their potential in online MARL remains largely under-explored. A major obstacle is that the intractable likelihoods of diffusion models impede entropy-based exploration and coordination. To tackle this challenge, we propose among the first \underline{O}nline off-policy \underline{MA}RL framework using \underline{D}iffusion policies (\textbf{OMAD}) to orchestrate coordination. Our key innovation is a relaxed policy objective that maximizes scaled joint entropy, facilitating effective exploration without relying on tractable likelihood. Complementing this, within the centralized training with decentralized execution (CTDE) paradigm, we employ a joint distributional value function to optimize decentralized diffusion policies. It leverages tractable entropy-augmented targets to guide the simultaneous updates of diffusion policies, thereby ensuring stable coordination. Extensive evaluations on MPE and MAMuJoCo establish our method as the new state-of-the-art across $10$ diverse tasks, demonstrating a remarkable $2.5\times$ to $5\times$ improvement in sample efficiency.
Related papers
- Dichotomous Diffusion Policy Optimization [46.51375996317989]
DIPOLE is a novel RL algorithm designed for stable and controllable diffusion policy optimization.<n>We also use DIPOLE to train a large vision-language-action model for end-to-end autonomous driving.
arXiv Detail & Related papers (2025-12-31T16:56:56Z) - Multi-Agent Conditional Diffusion Model with Mean Field Communication as Wireless Resource Allocation Planner [16.759740918605768]
In wireless communication systems, efficient and adaptive resource allocation plays a crucial role in enhancing Quality of Service (QoS)<n>In contrast, the Distributed Training with Decentralized Execution (DTDE) paradigm enables distributed learning and decision-making.<n>We propose the Multi-Agent Conditional Diffusion Model Planner (MACDMP) for decentralized communication resource management.
arXiv Detail & Related papers (2025-10-27T03:42:18Z) - Reinforcement Learning with Discrete Diffusion Policies for Combinatorial Action Spaces [57.466101098183884]
Reinforcement learning (RL) struggles to scale to large, action spaces common in many real-world problems.<n>This paper introduces a novel framework for training discrete diffusion models as highly effective policies in complex settings.
arXiv Detail & Related papers (2025-09-26T21:53:36Z) - Offline Multi-agent Reinforcement Learning via Score Decomposition [51.23590397383217]
offline cooperative multi-agent reinforcement learning (MARL) faces unique challenges due to distributional shifts.<n>This work is the first work to explicitly address the distributional gap between offline and online MARL.
arXiv Detail & Related papers (2025-05-09T11:42:31Z) - Offline Multi-Agent Reinforcement Learning via In-Sample Sequential Policy Optimization [8.877649895977479]
offline Multi-Agent Reinforcement Learning (MARL) is an emerging field that aims to learn optimal multi-agent policies from pre-collected datasets.<n>In this work, we revisit the existing offline MARL methods and show that in certain scenarios they can be problematic.<n>We propose a new offline MARL algorithm, named In-Sample Sequential Policy Optimization (InSPO)
arXiv Detail & Related papers (2024-12-10T16:19:08Z) - ComaDICE: Offline Cooperative Multi-Agent Reinforcement Learning with Stationary Distribution Shift Regularization [11.620274237352026]
offline reinforcement learning (RL) has garnered significant attention for its ability to learn effective policies from pre-collected datasets.
MARL presents additional challenges due to the large joint state-action space and the complexity of multi-agent behaviors.
We introduce a regularizer in the space of stationary distributions to better handle distributional shift.
arXiv Detail & Related papers (2024-10-02T18:56:10Z) - Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization [55.97310586039358]
Diffusion models have garnered widespread attention in Reinforcement Learning (RL) for their powerful expressiveness and multimodality.<n>We propose a novel model-free diffusion-based online RL algorithm, Q-weighted Variational Policy Optimization (QVPO)<n>Specifically, we introduce the Q-weighted variational loss, which can be proved to be a tight lower bound of the policy objective in online RL under certain conditions.<n>We also develop an efficient behavior policy to enhance sample efficiency by reducing the variance of the diffusion policy during online interactions.
arXiv Detail & Related papers (2024-05-25T10:45:46Z) - MADiff: Offline Multi-agent Learning with Diffusion Models [79.18130544233794]
MADiff is a diffusion-based multi-agent learning framework.<n>It works as both a decentralized policy and a centralized controller.<n>Our experiments demonstrate that MADiff outperforms baseline algorithms across various multi-agent learning tasks.
arXiv Detail & Related papers (2023-05-27T02:14:09Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.