Believe What You See: Implicit Constraint Approach for Offline
Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2106.03400v1
- Date: Mon, 7 Jun 2021 08:02:31 GMT
- Title: Believe What You See: Implicit Constraint Approach for Offline
Multi-Agent Reinforcement Learning
- Authors: Yiqin Yang, Xiaoteng Ma, Chenghao Li, Zewu Zheng, Qiyuan Zhang, Gao
Huang, Jun Yang, Qianchuan Zhao
- Abstract summary: Current offline RL algorithms are ineffective in multi-agent systems due to the accumulated extrapolation error.
We propose a novel offline RL algorithm, named Implicit Constraint Q-learning (ICQ), which effectively alleviates the extrapolation error.
Experimental results demonstrate that the extrapolation error is reduced to almost zero and insensitive to the number of agents.
- Score: 16.707045765042505
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Learning from datasets without interaction with environments (Offline
Learning) is an essential step to apply Reinforcement Learning (RL) algorithms
in real-world scenarios. However, compared with the single-agent counterpart,
offline multi-agent RL introduces more agents with the larger state and action
space, which is more challenging but attracts little attention. We demonstrate
current offline RL algorithms are ineffective in multi-agent systems due to the
accumulated extrapolation error. In this paper, we propose a novel offline RL
algorithm, named Implicit Constraint Q-learning (ICQ), which effectively
alleviates the extrapolation error by only trusting the state-action pairs
given in the dataset for value estimation. Moreover, we extend ICQ to
multi-agent tasks by decomposing the joint-policy under the implicit
constraint. Experimental results demonstrate that the extrapolation error is
reduced to almost zero and insensitive to the number of agents. We further show
that ICQ achieves the state-of-the-art performance in the challenging
multi-agent offline tasks (StarCraft II).
Related papers
- Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Learning RL-Policies for Joint Beamforming Without Exploration: A Batch
Constrained Off-Policy Approach [1.0080317855851213]
We consider the problem of network parameter cancellation optimization for networks.
We show that deploying an algorithm in the real world for exploration and learning can be achieved with the data without exploring.
arXiv Detail & Related papers (2023-10-12T18:36:36Z) - Offline Multi-Agent Reinforcement Learning with Coupled Value
Factorization [2.66512000865131]
We present OMAC, a new offline multi-agent RL algorithm with coupled value factorization.
OMAC performs in-sample learning on the local state-value functions, which implicitly conducts max-Q operation at the local level.
We demonstrate the superior performance of OMAC over the state-of-the-art offline multi-agent RL methods.
arXiv Detail & Related papers (2023-06-15T07:08:41Z) - Learning From Good Trajectories in Offline Multi-Agent Reinforcement
Learning [98.07495732562654]
offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets.
One agent learned by offline MARL often inherits this random policy, jeopardizing the performance of the entire team.
We propose a novel framework called Shared Individual Trajectories (SIT) to address this problem.
arXiv Detail & Related papers (2022-11-28T18:11:26Z) - Retrieval-Augmented Reinforcement Learning [63.32076191982944]
We train a network to map a dataset of past experiences to optimal behavior.
The retrieval process is trained to retrieve information from the dataset that may be useful in the current context.
We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
arXiv Detail & Related papers (2022-02-17T02:44:05Z) - Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement
Learning with Actor Rectification [74.10976684469435]
offline reinforcement learning (RL) algorithms can be transferred to multi-agent settings directly.
We propose a simple yet effective method, Offline Multi-Agent RL with Actor Rectification (OMAR), to tackle this critical challenge.
OMAR significantly outperforms strong baselines with state-of-the-art performance in multi-agent continuous control benchmarks.
arXiv Detail & Related papers (2021-11-22T13:27:42Z) - Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward.
We introduce a new RL formulation for text generation from the soft Q-learning perspective.
We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z) - EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline
and Online RL [48.552287941528]
Off-policy reinforcement learning holds the promise of sample-efficient learning of decision-making policies.
In the offline RL setting, standard off-policy RL methods can significantly underperform.
We introduce Expected-Max Q-Learning (EMaQ), which is more closely related to the resulting practical algorithm.
arXiv Detail & Related papers (2020-07-21T21:13:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.