Counterfactual Conservative Q Learning for Offline Multi-agent
Reinforcement Learning
- URL: http://arxiv.org/abs/2309.12696v1
- Date: Fri, 22 Sep 2023 08:10:25 GMT
- Title: Counterfactual Conservative Q Learning for Offline Multi-agent
Reinforcement Learning
- Authors: Jianzhun Shao, Yun Qu, Chen Chen, Hongchang Zhang, Xiangyang Ji
- Abstract summary: We propose a novel multi-agent offline RL algorithm, named CounterFactual Conservative Q-Learning (CFCQL)
CFCQL calculates conservative regularization for each agent separately in a counterfactual way and then linearly combines them to realize an overall conservative value estimation.
We prove that it still enjoys the underestimation property and the performance guarantee as those single agent conservative methods do, but the induced regularization and safe policy improvement bound are independent of the agent number.
- Score: 54.788422270960496
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline multi-agent reinforcement learning is challenging due to the coupling
effect of both distribution shift issue common in offline setting and the high
dimension issue common in multi-agent setting, making the action
out-of-distribution (OOD) and value overestimation phenomenon excessively
severe. Tomitigate this problem, we propose a novel multi-agent offline RL
algorithm, named CounterFactual Conservative Q-Learning (CFCQL) to conduct
conservative value estimation. Rather than regarding all the agents as a high
dimensional single one and directly applying single agent methods to it, CFCQL
calculates conservative regularization for each agent separately in a
counterfactual way and then linearly combines them to realize an overall
conservative value estimation. We prove that it still enjoys the
underestimation property and the performance guarantee as those single agent
conservative methods do, but the induced regularization and safe policy
improvement bound are independent of the agent number, which is therefore
theoretically superior to the direct treatment referred to above, especially
when the agent number is large. We further conduct experiments on four
environments including both discrete and continuous action settings on both
existing and our man-made datasets, demonstrating that CFCQL outperforms
existing methods on most datasets and even with a remarkable margin on some of
them.
Related papers
- Factored Online Planning in Many-Agent POMDPs [8.728372851272727]
In centralized multi-agent systems, the action and observation spaces grow exponentially with the number of agents.
We introduce weighted particle filtering to a sample-based online planner for MPOMDPs.
Third, we present a scalable approximation of the belief.
arXiv Detail & Related papers (2023-12-18T18:35:30Z) - AlberDICE: Addressing Out-Of-Distribution Joint Actions in Offline
Multi-Agent RL via Alternating Stationary Distribution Correction Estimation [65.4532392602682]
One of the main challenges in offline Reinforcement Learning (RL) is the distribution shift that arises from the learned policy deviating from the data collection policy.
This is often addressed by avoiding out-of-distribution (OOD) actions during policy improvement as their presence can lead to substantial performance degradation.
We introduce AlberDICE, an offline MARL algorithm that performs centralized training of individual agents based on stationary distribution optimization.
arXiv Detail & Related papers (2023-11-03T18:56:48Z) - Learning From Good Trajectories in Offline Multi-Agent Reinforcement
Learning [98.07495732562654]
offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets.
One agent learned by offline MARL often inherits this random policy, jeopardizing the performance of the entire team.
We propose a novel framework called Shared Individual Trajectories (SIT) to address this problem.
arXiv Detail & Related papers (2022-11-28T18:11:26Z) - Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement
Learning with Actor Rectification [74.10976684469435]
offline reinforcement learning (RL) algorithms can be transferred to multi-agent settings directly.
We propose a simple yet effective method, Offline Multi-Agent RL with Actor Rectification (OMAR), to tackle this critical challenge.
OMAR significantly outperforms strong baselines with state-of-the-art performance in multi-agent continuous control benchmarks.
arXiv Detail & Related papers (2021-11-22T13:27:42Z) - Believe What You See: Implicit Constraint Approach for Offline
Multi-Agent Reinforcement Learning [16.707045765042505]
Current offline RL algorithms are ineffective in multi-agent systems due to the accumulated extrapolation error.
We propose a novel offline RL algorithm, named Implicit Constraint Q-learning (ICQ), which effectively alleviates the extrapolation error.
Experimental results demonstrate that the extrapolation error is reduced to almost zero and insensitive to the number of agents.
arXiv Detail & Related papers (2021-06-07T08:02:31Z) - Conservative Q-Learning for Offline Reinforcement Learning [106.05582605650932]
We show that CQL substantially outperforms existing offline RL methods, often learning policies that attain 2-5 times higher final return.
We theoretically show that CQL produces a lower bound on the value of the current policy and that it can be incorporated into a policy learning procedure with theoretical improvement guarantees.
arXiv Detail & Related papers (2020-06-08T17:53:42Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.