Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement
Learning with Actor Rectification
- URL: http://arxiv.org/abs/2111.11188v1
- Date: Mon, 22 Nov 2021 13:27:42 GMT
- Title: Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement
Learning with Actor Rectification
- Authors: Ling Pan, Longbo Huang, Tengyu Ma, Huazhe Xu
- Abstract summary: offline reinforcement learning (RL) algorithms can be transferred to multi-agent settings directly.
We propose a simple yet effective method, Offline Multi-Agent RL with Actor Rectification (OMAR), to tackle this critical challenge.
OMAR significantly outperforms strong baselines with state-of-the-art performance in multi-agent continuous control benchmarks.
- Score: 74.10976684469435
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The idea of conservatism has led to significant progress in offline
reinforcement learning (RL) where an agent learns from pre-collected datasets.
However, it is still an open question to resolve offline RL in the more
practical multi-agent setting as many real-world scenarios involve interaction
among multiple agents. Given the recent success of transferring online RL
algorithms to the multi-agent setting, one may expect that offline RL
algorithms will also transfer to multi-agent settings directly. Surprisingly,
when conservatism-based algorithms are applied to the multi-agent setting, the
performance degrades significantly with an increasing number of agents. Towards
mitigating the degradation, we identify that a key issue that the landscape of
the value function can be non-concave and policy gradient improvements are
prone to local optima. Multiple agents exacerbate the problem since the
suboptimal policy by any agent could lead to uncoordinated global failure.
Following this intuition, we propose a simple yet effective method, Offline
Multi-Agent RL with Actor Rectification (OMAR), to tackle this critical
challenge via an effective combination of first-order policy gradient and
zeroth-order optimization methods for the actor to better optimize the
conservative value function. Despite the simplicity, OMAR significantly
outperforms strong baselines with state-of-the-art performance in multi-agent
continuous control benchmarks.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Counterfactual Conservative Q Learning for Offline Multi-agent
Reinforcement Learning [54.788422270960496]
We propose a novel multi-agent offline RL algorithm, named CounterFactual Conservative Q-Learning (CFCQL)
CFCQL calculates conservative regularization for each agent separately in a counterfactual way and then linearly combines them to realize an overall conservative value estimation.
We prove that it still enjoys the underestimation property and the performance guarantee as those single agent conservative methods do, but the induced regularization and safe policy improvement bound are independent of the agent number.
arXiv Detail & Related papers (2023-09-22T08:10:25Z) - Offline Multi-Agent Reinforcement Learning with Implicit Global-to-Local
Value Regularization [23.416448404647305]
OMIGA is a new offline m ulti-agent RL algorithm with implicit global-to-local v alue regularization.
We show that OMIGA achieves superior performance over the state-of-the-art offline MARL methods in almost all tasks.
arXiv Detail & Related papers (2023-07-21T14:37:54Z) - Offline Multi-Agent Reinforcement Learning with Coupled Value
Factorization [2.66512000865131]
We present OMAC, a new offline multi-agent RL algorithm with coupled value factorization.
OMAC performs in-sample learning on the local state-value functions, which implicitly conducts max-Q operation at the local level.
We demonstrate the superior performance of OMAC over the state-of-the-art offline multi-agent RL methods.
arXiv Detail & Related papers (2023-06-15T07:08:41Z) - Multi-agent Policy Reciprocity with Theoretical Guarantee [24.65151626601257]
We propose a novel multi-agent policy reciprocity (PR) framework, where each agent can fully exploit cross-agent policies even in mismatched states.
Experimental results on discrete and continuous environments demonstrate that PR outperforms various existing RL and transfer RL methods.
arXiv Detail & Related papers (2023-04-12T06:27:10Z) - Learning From Good Trajectories in Offline Multi-Agent Reinforcement
Learning [98.07495732562654]
offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets.
One agent learned by offline MARL often inherits this random policy, jeopardizing the performance of the entire team.
We propose a novel framework called Shared Individual Trajectories (SIT) to address this problem.
arXiv Detail & Related papers (2022-11-28T18:11:26Z) - OptiDICE: Offline Policy Optimization via Stationary Distribution
Correction Estimation [59.469401906712555]
We present an offline reinforcement learning algorithm that prevents overestimation in a more principled way.
Our algorithm, OptiDICE, directly estimates the stationary distribution corrections of the optimal policy.
We show that OptiDICE performs competitively with the state-of-the-art methods.
arXiv Detail & Related papers (2021-06-21T00:43:30Z) - Believe What You See: Implicit Constraint Approach for Offline
Multi-Agent Reinforcement Learning [16.707045765042505]
Current offline RL algorithms are ineffective in multi-agent systems due to the accumulated extrapolation error.
We propose a novel offline RL algorithm, named Implicit Constraint Q-learning (ICQ), which effectively alleviates the extrapolation error.
Experimental results demonstrate that the extrapolation error is reduced to almost zero and insensitive to the number of agents.
arXiv Detail & Related papers (2021-06-07T08:02:31Z) - Scalable Multi-Agent Inverse Reinforcement Learning via
Actor-Attention-Critic [54.2180984002807]
Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent approach that applies single-agent AIRL to multi-agent problems.
We propose a multi-agent inverse RL algorithm that is more sample-efficient and scalable than previous works.
arXiv Detail & Related papers (2020-02-24T20:30:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.