AlberDICE: Addressing Out-Of-Distribution Joint Actions in Offline
Multi-Agent RL via Alternating Stationary Distribution Correction Estimation
- URL: http://arxiv.org/abs/2311.02194v1
- Date: Fri, 3 Nov 2023 18:56:48 GMT
- Title: AlberDICE: Addressing Out-Of-Distribution Joint Actions in Offline
Multi-Agent RL via Alternating Stationary Distribution Correction Estimation
- Authors: Daiki E. Matsunaga, Jongmin Lee, Jaeseok Yoon, Stefanos Leonardos,
Pieter Abbeel, Kee-Eung Kim
- Abstract summary: One of the main challenges in offline Reinforcement Learning (RL) is the distribution shift that arises from the learned policy deviating from the data collection policy.
This is often addressed by avoiding out-of-distribution (OOD) actions during policy improvement as their presence can lead to substantial performance degradation.
We introduce AlberDICE, an offline MARL algorithm that performs centralized training of individual agents based on stationary distribution optimization.
- Score: 65.4532392602682
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One of the main challenges in offline Reinforcement Learning (RL) is the
distribution shift that arises from the learned policy deviating from the data
collection policy. This is often addressed by avoiding out-of-distribution
(OOD) actions during policy improvement as their presence can lead to
substantial performance degradation. This challenge is amplified in the offline
Multi-Agent RL (MARL) setting since the joint action space grows exponentially
with the number of agents. To avoid this curse of dimensionality, existing MARL
methods adopt either value decomposition methods or fully decentralized
training of individual agents. However, even when combined with standard
conservatism principles, these methods can still result in the selection of OOD
joint actions in offline MARL. To this end, we introduce AlberDICE, an offline
MARL algorithm that alternatively performs centralized training of individual
agents based on stationary distribution optimization. AlberDICE circumvents the
exponential complexity of MARL by computing the best response of one agent at a
time while effectively avoiding OOD joint action selection. Theoretically, we
show that the alternating optimization procedure converges to Nash policies. In
the experiments, we demonstrate that AlberDICE significantly outperforms
baseline algorithms on a standard suite of MARL benchmarks.
Related papers
- ComaDICE: Offline Cooperative Multi-Agent Reinforcement Learning with Stationary Distribution Shift Regularization [11.620274237352026]
offline reinforcement learning (RL) has garnered significant attention for its ability to learn effective policies from pre-collected datasets.
MARL presents additional challenges due to the large joint state-action space and the complexity of multi-agent behaviors.
We introduce a regularizer in the space of stationary distributions to better handle distributional shift.
arXiv Detail & Related papers (2024-10-02T18:56:10Z) - Diffusion-based Episodes Augmentation for Offline Multi-Agent Reinforcement Learning [24.501511979962746]
offline multi-agent reinforcement learning (MARL) is increasingly recognized as crucial for effectively deploying RL algorithms in environments where real-time interaction is impractical, risky, or costly.
We present EAQ, Episodes Augmentation guided by Q-total loss, a novel approach for offline MARL framework utilizing diffusion models.
arXiv Detail & Related papers (2024-08-23T14:17:17Z) - Decentralized Smoothing ADMM for Quantile Regression with Non-Convex Sparse Penalties [3.269165283595478]
In the rapidly evolving internet-of-things (IoT) ecosystem, effective data analysis techniques are crucial for handling distributed data generated by sensors.
Addressing the limitations of existing methods, such as the sub-gradient consensus approach, which fails to distinguish between active and non-active coefficients.
arXiv Detail & Related papers (2024-08-02T15:00:04Z) - Noise Distribution Decomposition based Multi-Agent Distributional Reinforcement Learning [15.82785057592436]
Multi-Agent Reinforcement Learning (MARL) is more susceptible to noise due to the interference among intelligent agents.
We propose a novel decomposition-based multi-agent distributional RL method by approxing the globally shared noisy reward.
We also verify the effectiveness of the proposed method through extensive simulation experiments with noisy rewards.
arXiv Detail & Related papers (2023-12-12T07:24:15Z) - Offline Policy Optimization in RL with Variance Regularizaton [142.87345258222942]
We propose variance regularization for offline RL algorithms, using stationary distribution corrections.
We show that by using Fenchel duality, we can avoid double sampling issues for computing the gradient of the variance regularizer.
The proposed algorithm for offline variance regularization (OVAR) can be used to augment any existing offline policy optimization algorithms.
arXiv Detail & Related papers (2022-12-29T18:25:01Z) - TD3 with Reverse KL Regularizer for Offline Reinforcement Learning from
Mixed Datasets [118.22975463000928]
We consider an offline reinforcement learning (RL) setting where the agent need to learn from a dataset collected by rolling out multiple behavior policies.
There are two challenges for this setting: 1) The optimal trade-off between optimizing the RL signal and the behavior cloning (BC) signal changes on different states due to the variation of the action coverage induced by different behavior policies.
In this paper, we address both challenges by using adaptively weighted reverse Kullback-Leibler (KL) divergence as the BC regularizer based on the TD3 algorithm.
arXiv Detail & Related papers (2022-12-05T09:36:23Z) - Learning From Good Trajectories in Offline Multi-Agent Reinforcement
Learning [98.07495732562654]
offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets.
One agent learned by offline MARL often inherits this random policy, jeopardizing the performance of the entire team.
We propose a novel framework called Shared Individual Trajectories (SIT) to address this problem.
arXiv Detail & Related papers (2022-11-28T18:11:26Z) - Semi-Markov Offline Reinforcement Learning for Healthcare [57.15307499843254]
We introduce three offline RL algorithms, namely, SDQN, SDDQN, and SBCQ.
We experimentally demonstrate that only these algorithms learn the optimal policy in variable-time environments.
We apply our new algorithms to a real-world offline dataset pertaining to warfarin dosing for stroke prevention.
arXiv Detail & Related papers (2022-03-17T14:51:21Z) - Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement
Learning with Actor Rectification [74.10976684469435]
offline reinforcement learning (RL) algorithms can be transferred to multi-agent settings directly.
We propose a simple yet effective method, Offline Multi-Agent RL with Actor Rectification (OMAR), to tackle this critical challenge.
OMAR significantly outperforms strong baselines with state-of-the-art performance in multi-agent continuous control benchmarks.
arXiv Detail & Related papers (2021-11-22T13:27:42Z) - Divergence-Regularized Multi-Agent Actor-Critic [17.995905582226467]
We propose a novel off-policy cooperative MARL framework, divergence-regularized multi-agent actor-critic (DMAC)
DMAC is a flexible framework and can be combined with many existing MARL algorithms.
We empirically show that DMAC substantially improves the performance of existing MARL algorithms.
arXiv Detail & Related papers (2021-10-01T10:27:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.