Offline Multi-Agent Reinforcement Learning with Implicit Global-to-Local
Value Regularization
- URL: http://arxiv.org/abs/2307.11620v2
- Date: Tue, 7 Nov 2023 11:13:56 GMT
- Title: Offline Multi-Agent Reinforcement Learning with Implicit Global-to-Local
Value Regularization
- Authors: Xiangsen Wang, Haoran Xu, Yinan Zheng, Xianyuan Zhan
- Abstract summary: OMIGA is a new offline m ulti-agent RL algorithm with implicit global-to-local v alue regularization.
We show that OMIGA achieves superior performance over the state-of-the-art offline MARL methods in almost all tasks.
- Score: 23.416448404647305
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Offline reinforcement learning (RL) has received considerable attention in
recent years due to its attractive capability of learning policies from offline
datasets without environmental interactions. Despite some success in the
single-agent setting, offline multi-agent RL (MARL) remains to be a challenge.
The large joint state-action space and the coupled multi-agent behaviors pose
extra complexities for offline policy optimization. Most existing offline MARL
studies simply apply offline data-related regularizations on individual agents,
without fully considering the multi-agent system at the global level. In this
work, we present OMIGA, a new offline m ulti-agent RL algorithm with implicit
global-to-local v alue regularization. OMIGA provides a principled framework to
convert global-level value regularization into equivalent implicit local value
regularizations and simultaneously enables in-sample learning, thus elegantly
bridging multi-agent value decomposition and policy learning with offline
regularizations. Based on comprehensive experiments on the offline multi-agent
MuJoCo and StarCraft II micro-management tasks, we show that OMIGA achieves
superior performance over the state-of-the-art offline MARL methods in almost
all tasks.
Related papers
- ComaDICE: Offline Cooperative Multi-Agent Reinforcement Learning with Stationary Distribution Shift Regularization [11.620274237352026]
offline reinforcement learning (RL) has garnered significant attention for its ability to learn effective policies from pre-collected datasets.
MARL presents additional challenges due to the large joint state-action space and the complexity of multi-agent behaviors.
We introduce a regularizer in the space of stationary distributions to better handle distributional shift.
arXiv Detail & Related papers (2024-10-02T18:56:10Z) - AlberDICE: Addressing Out-Of-Distribution Joint Actions in Offline
Multi-Agent RL via Alternating Stationary Distribution Correction Estimation [65.4532392602682]
One of the main challenges in offline Reinforcement Learning (RL) is the distribution shift that arises from the learned policy deviating from the data collection policy.
This is often addressed by avoiding out-of-distribution (OOD) actions during policy improvement as their presence can lead to substantial performance degradation.
We introduce AlberDICE, an offline MARL algorithm that performs centralized training of individual agents based on stationary distribution optimization.
arXiv Detail & Related papers (2023-11-03T18:56:48Z) - Offline Multi-Agent Reinforcement Learning with Coupled Value
Factorization [2.66512000865131]
We present OMAC, a new offline multi-agent RL algorithm with coupled value factorization.
OMAC performs in-sample learning on the local state-value functions, which implicitly conducts max-Q operation at the local level.
We demonstrate the superior performance of OMAC over the state-of-the-art offline multi-agent RL methods.
arXiv Detail & Related papers (2023-06-15T07:08:41Z) - Learning From Good Trajectories in Offline Multi-Agent Reinforcement
Learning [98.07495732562654]
offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets.
One agent learned by offline MARL often inherits this random policy, jeopardizing the performance of the entire team.
We propose a novel framework called Shared Individual Trajectories (SIT) to address this problem.
arXiv Detail & Related papers (2022-11-28T18:11:26Z) - Model-Based Offline Meta-Reinforcement Learning with Regularization [63.35040401948943]
offline Meta-RL is emerging as a promising approach to address these challenges.
MerPO learns a meta-model for efficient task structure inference and an informative meta-policy.
We show that MerPO offers guaranteed improvement over both the behavior policy and the meta-policy.
arXiv Detail & Related papers (2022-02-07T04:15:20Z) - Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement
Learning with Actor Rectification [74.10976684469435]
offline reinforcement learning (RL) algorithms can be transferred to multi-agent settings directly.
We propose a simple yet effective method, Offline Multi-Agent RL with Actor Rectification (OMAR), to tackle this critical challenge.
OMAR significantly outperforms strong baselines with state-of-the-art performance in multi-agent continuous control benchmarks.
arXiv Detail & Related papers (2021-11-22T13:27:42Z) - Offline Meta-Reinforcement Learning with Online Self-Supervision [66.42016534065276]
We propose a hybrid offline meta-RL algorithm, which uses offline data with rewards to meta-train an adaptive policy.
Our method uses the offline data to learn the distribution of reward functions, which is then sampled to self-supervise reward labels for the additional online data.
We find that using additional data and self-generated rewards significantly improves an agent's ability to generalize.
arXiv Detail & Related papers (2021-07-08T17:01:32Z) - Behavioral Priors and Dynamics Models: Improving Performance and Domain
Transfer in Offline RL [82.93243616342275]
We introduce Offline Model-based RL with Adaptive Behavioral Priors (MABE)
MABE is based on the finding that dynamics models, which support within-domain generalization, and behavioral priors, which support cross-domain generalization, are complementary.
In experiments that require cross-domain generalization, we find that MABE outperforms prior methods.
arXiv Detail & Related papers (2021-06-16T20:48:49Z) - Believe What You See: Implicit Constraint Approach for Offline
Multi-Agent Reinforcement Learning [16.707045765042505]
Current offline RL algorithms are ineffective in multi-agent systems due to the accumulated extrapolation error.
We propose a novel offline RL algorithm, named Implicit Constraint Q-learning (ICQ), which effectively alleviates the extrapolation error.
Experimental results demonstrate that the extrapolation error is reduced to almost zero and insensitive to the number of agents.
arXiv Detail & Related papers (2021-06-07T08:02:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.