ComaDICE: Offline Cooperative Multi-Agent Reinforcement Learning with Stationary Distribution Shift Regularization
- URL: http://arxiv.org/abs/2410.01954v1
- Date: Wed, 2 Oct 2024 18:56:10 GMT
- Title: ComaDICE: Offline Cooperative Multi-Agent Reinforcement Learning with Stationary Distribution Shift Regularization
- Authors: The Viet Bui, Thanh Hong Nguyen, Tien Mai,
- Abstract summary: offline reinforcement learning (RL) has garnered significant attention for its ability to learn effective policies from pre-collected datasets.
MARL presents additional challenges due to the large joint state-action space and the complexity of multi-agent behaviors.
We introduce a regularizer in the space of stationary distributions to better handle distributional shift.
- Score: 11.620274237352026
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline reinforcement learning (RL) has garnered significant attention for its ability to learn effective policies from pre-collected datasets without the need for further environmental interactions. While promising results have been demonstrated in single-agent settings, offline multi-agent reinforcement learning (MARL) presents additional challenges due to the large joint state-action space and the complexity of multi-agent behaviors. A key issue in offline RL is the distributional shift, which arises when the target policy being optimized deviates from the behavior policy that generated the data. This problem is exacerbated in MARL due to the interdependence between agents' local policies and the expansive joint state-action space. Prior approaches have primarily addressed this challenge by incorporating regularization in the space of either Q-functions or policies. In this work, we introduce a regularizer in the space of stationary distributions to better handle distributional shift. Our algorithm, ComaDICE, offers a principled framework for offline cooperative MARL by incorporating stationary distribution regularization for the global learning policy, complemented by a carefully structured multi-agent value decomposition strategy to facilitate multi-agent training. Through extensive experiments on the multi-agent MuJoCo and StarCraft II benchmarks, we demonstrate that ComaDICE achieves superior performance compared to state-of-the-art offline MARL methods across nearly all tasks.
Related papers
- Diffusion-based Episodes Augmentation for Offline Multi-Agent Reinforcement Learning [24.501511979962746]
offline multi-agent reinforcement learning (MARL) is increasingly recognized as crucial for effectively deploying RL algorithms in environments where real-time interaction is impractical, risky, or costly.
We present EAQ, Episodes Augmentation guided by Q-total loss, a novel approach for offline MARL framework utilizing diffusion models.
arXiv Detail & Related papers (2024-08-23T14:17:17Z) - AlberDICE: Addressing Out-Of-Distribution Joint Actions in Offline
Multi-Agent RL via Alternating Stationary Distribution Correction Estimation [65.4532392602682]
One of the main challenges in offline Reinforcement Learning (RL) is the distribution shift that arises from the learned policy deviating from the data collection policy.
This is often addressed by avoiding out-of-distribution (OOD) actions during policy improvement as their presence can lead to substantial performance degradation.
We introduce AlberDICE, an offline MARL algorithm that performs centralized training of individual agents based on stationary distribution optimization.
arXiv Detail & Related papers (2023-11-03T18:56:48Z) - Offline Multi-Agent Reinforcement Learning with Implicit Global-to-Local
Value Regularization [23.416448404647305]
OMIGA is a new offline m ulti-agent RL algorithm with implicit global-to-local v alue regularization.
We show that OMIGA achieves superior performance over the state-of-the-art offline MARL methods in almost all tasks.
arXiv Detail & Related papers (2023-07-21T14:37:54Z) - Learning Reward Machines in Cooperative Multi-Agent Tasks [75.79805204646428]
This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL)
It combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks.
The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments.
arXiv Detail & Related papers (2023-03-24T15:12:28Z) - Learning From Good Trajectories in Offline Multi-Agent Reinforcement
Learning [98.07495732562654]
offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets.
One agent learned by offline MARL often inherits this random policy, jeopardizing the performance of the entire team.
We propose a novel framework called Shared Individual Trajectories (SIT) to address this problem.
arXiv Detail & Related papers (2022-11-28T18:11:26Z) - Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement
Learning with Actor Rectification [74.10976684469435]
offline reinforcement learning (RL) algorithms can be transferred to multi-agent settings directly.
We propose a simple yet effective method, Offline Multi-Agent RL with Actor Rectification (OMAR), to tackle this critical challenge.
OMAR significantly outperforms strong baselines with state-of-the-art performance in multi-agent continuous control benchmarks.
arXiv Detail & Related papers (2021-11-22T13:27:42Z) - Regularize! Don't Mix: Multi-Agent Reinforcement Learning without
Explicit Centralized Structures [8.883885464358737]
We propose using regularization for Multi-Agent Reinforcement Learning rather than learning explicit cooperative structures called em Multi-Agent Regularized Q-learning (MARQ)
Our algorithm is evaluated on several benchmark multi-agent environments and we show that MARQ consistently outperforms several baselines and state-of-the-art algorithms.
arXiv Detail & Related papers (2021-09-19T00:58:38Z) - UneVEn: Universal Value Exploration for Multi-Agent Reinforcement
Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn)
UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features.
Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.