Offline Multi-Agent Reinforcement Learning with Coupled Value
Factorization
- URL: http://arxiv.org/abs/2306.08900v1
- Date: Thu, 15 Jun 2023 07:08:41 GMT
- Title: Offline Multi-Agent Reinforcement Learning with Coupled Value
Factorization
- Authors: Xiangsen Wang, Xianyuan Zhan
- Abstract summary: We present OMAC, a new offline multi-agent RL algorithm with coupled value factorization.
OMAC performs in-sample learning on the local state-value functions, which implicitly conducts max-Q operation at the local level.
We demonstrate the superior performance of OMAC over the state-of-the-art offline multi-agent RL methods.
- Score: 2.66512000865131
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Offline reinforcement learning (RL) that learns policies from offline
datasets without environment interaction has received considerable attention in
recent years. Compared with the rich literature in the single-agent case,
offline multi-agent RL is still a relatively underexplored area. Most existing
methods directly apply offline RL ingredients in the multi-agent setting
without fully leveraging the decomposable problem structure, leading to less
satisfactory performance in complex tasks. We present OMAC, a new offline
multi-agent RL algorithm with coupled value factorization. OMAC adopts a
coupled value factorization scheme that decomposes the global value function
into local and shared components, and also maintains the credit assignment
consistency between the state-value and Q-value functions. Moreover, OMAC
performs in-sample learning on the decomposed local state-value functions,
which implicitly conducts max-Q operation at the local level while avoiding
distributional shift caused by evaluating out-of-distribution actions. Based on
the comprehensive evaluations of the offline multi-agent StarCraft II
micro-management tasks, we demonstrate the superior performance of OMAC over
the state-of-the-art offline multi-agent RL methods.
Related papers
- ComaDICE: Offline Cooperative Multi-Agent Reinforcement Learning with Stationary Distribution Shift Regularization [11.620274237352026]
offline reinforcement learning (RL) has garnered significant attention for its ability to learn effective policies from pre-collected datasets.
MARL presents additional challenges due to the large joint state-action space and the complexity of multi-agent behaviors.
We introduce a regularizer in the space of stationary distributions to better handle distributional shift.
arXiv Detail & Related papers (2024-10-02T18:56:10Z) - AlberDICE: Addressing Out-Of-Distribution Joint Actions in Offline
Multi-Agent RL via Alternating Stationary Distribution Correction Estimation [65.4532392602682]
One of the main challenges in offline Reinforcement Learning (RL) is the distribution shift that arises from the learned policy deviating from the data collection policy.
This is often addressed by avoiding out-of-distribution (OOD) actions during policy improvement as their presence can lead to substantial performance degradation.
We introduce AlberDICE, an offline MARL algorithm that performs centralized training of individual agents based on stationary distribution optimization.
arXiv Detail & Related papers (2023-11-03T18:56:48Z) - Counterfactual Conservative Q Learning for Offline Multi-agent
Reinforcement Learning [54.788422270960496]
We propose a novel multi-agent offline RL algorithm, named CounterFactual Conservative Q-Learning (CFCQL)
CFCQL calculates conservative regularization for each agent separately in a counterfactual way and then linearly combines them to realize an overall conservative value estimation.
We prove that it still enjoys the underestimation property and the performance guarantee as those single agent conservative methods do, but the induced regularization and safe policy improvement bound are independent of the agent number.
arXiv Detail & Related papers (2023-09-22T08:10:25Z) - Offline Multi-Agent Reinforcement Learning with Implicit Global-to-Local
Value Regularization [23.416448404647305]
OMIGA is a new offline m ulti-agent RL algorithm with implicit global-to-local v alue regularization.
We show that OMIGA achieves superior performance over the state-of-the-art offline MARL methods in almost all tasks.
arXiv Detail & Related papers (2023-07-21T14:37:54Z) - Learning From Good Trajectories in Offline Multi-Agent Reinforcement
Learning [98.07495732562654]
offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets.
One agent learned by offline MARL often inherits this random policy, jeopardizing the performance of the entire team.
We propose a novel framework called Shared Individual Trajectories (SIT) to address this problem.
arXiv Detail & Related papers (2022-11-28T18:11:26Z) - Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement
Learning with Actor Rectification [74.10976684469435]
offline reinforcement learning (RL) algorithms can be transferred to multi-agent settings directly.
We propose a simple yet effective method, Offline Multi-Agent RL with Actor Rectification (OMAR), to tackle this critical challenge.
OMAR significantly outperforms strong baselines with state-of-the-art performance in multi-agent continuous control benchmarks.
arXiv Detail & Related papers (2021-11-22T13:27:42Z) - Believe What You See: Implicit Constraint Approach for Offline
Multi-Agent Reinforcement Learning [16.707045765042505]
Current offline RL algorithms are ineffective in multi-agent systems due to the accumulated extrapolation error.
We propose a novel offline RL algorithm, named Implicit Constraint Q-learning (ICQ), which effectively alleviates the extrapolation error.
Experimental results demonstrate that the extrapolation error is reduced to almost zero and insensitive to the number of agents.
arXiv Detail & Related papers (2021-06-07T08:02:31Z) - Monotonic Value Function Factorisation for Deep Multi-Agent
Reinforcement Learning [55.20040781688844]
QMIX is a novel value-based method that can train decentralised policies in a centralised end-to-end fashion.
We propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning.
arXiv Detail & Related papers (2020-03-19T16:51:51Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.