Off-Beat Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2205.13718v1
- Date: Fri, 27 May 2022 02:21:04 GMT
- Title: Off-Beat Multi-Agent Reinforcement Learning
- Authors: Wei Qiu, Weixun Wang, Rundong Wang, Bo An, Yujing Hu, Svetlana
Obraztsova, Zinovi Rabinovich, Jianye Hao, Yingfeng Chen, Changjie Fan
- Abstract summary: We investigate model-free multi-agent reinforcement learning (MARL) in environments where off-beat actions are prevalent.
We propose a novel episodic memory, LeGEM, for model-free MARL algorithms.
We evaluate LeGEM on various multi-agent scenarios with off-beat actions, including Stag-Hunter Game, Quarry Game, Afforestation Game, and StarCraft II micromanagement tasks.
- Score: 62.833358249873704
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We investigate model-free multi-agent reinforcement learning (MARL) in
environments where off-beat actions are prevalent, i.e., all actions have
pre-set execution durations. During execution durations, the environment
changes are influenced by, but not synchronised with, action execution. Such a
setting is ubiquitous in many real-world problems. However, most MARL methods
assume actions are executed immediately after inference, which is often
unrealistic and can lead to catastrophic failure for multi-agent coordination
with off-beat actions. In order to fill this gap, we develop an algorithmic
framework for MARL with off-beat actions. We then propose a novel episodic
memory, LeGEM, for model-free MARL algorithms. LeGEM builds agents' episodic
memories by utilizing agents' individual experiences. It boosts multi-agent
learning by addressing the challenging temporal credit assignment problem
raised by the off-beat actions via our novel reward redistribution scheme,
alleviating the issue of non-Markovian reward. We evaluate LeGEM on various
multi-agent scenarios with off-beat actions, including Stag-Hunter Game, Quarry
Game, Afforestation Game, and StarCraft II micromanagement tasks. Empirical
results show that LeGEM significantly boosts multi-agent coordination and
achieves leading performance and improved sample efficiency.
Related papers
- Imagine, Initialize, and Explore: An Effective Exploration Method in
Multi-Agent Reinforcement Learning [27.81925751697255]
We propose a novel method for efficient multi-agent exploration in complex scenarios.
We formulate the imagination as a sequence modeling problem, where the states, observations, prompts, actions, and rewards are predicted autoregressively.
By initializing agents at the critical states, IIE significantly increases the likelihood of discovering potentially important underexplored regions.
arXiv Detail & Related papers (2024-02-28T01:45:01Z) - Asynchronous Multi-Agent Reinforcement Learning for Efficient Real-Time
Multi-Robot Cooperative Exploration [16.681164058779146]
We consider the problem of cooperative exploration where multiple robots need to cooperatively explore an unknown region as fast as possible.
Existing MARL-based methods adopt action-making steps as the metric for exploration efficiency by assuming all the agents are acting in a fully synchronous manner.
We propose an asynchronous MARL solution, Asynchronous Coordination Explorer (ACE), to tackle this real-world challenge.
arXiv Detail & Related papers (2023-01-09T14:53:38Z) - RPM: Generalizable Behaviors for Multi-Agent Reinforcement Learning [90.43925357575543]
We propose ranked policy memory ( RPM) to collect diverse multi-agent trajectories for training MARL policies with good generalizability.
RPM enables MARL agents to interact with unseen agents in multi-agent generalization evaluation scenarios and complete given tasks, and it significantly boosts the performance up to 402% on average.
arXiv Detail & Related papers (2022-10-18T07:32:43Z) - The StarCraft Multi-Agent Challenges+ : Learning of Multi-Stage Tasks
and Environmental Factors without Precise Reward Functions [14.399479538886064]
We propose a novel benchmark called the StarCraft Multi-Agent Challenges+.
This challenge is interested in the exploration capability of MARL algorithms to efficiently learn implicit multi-stage tasks and environmental factors as well as micro-control.
We investigate MARL algorithms under SMAC+ and observe that recent approaches work well in similar settings to the previous challenges, but misbehave in offensive scenarios.
arXiv Detail & Related papers (2022-07-05T12:43:54Z) - Multi-agent Actor-Critic with Time Dynamical Opponent Model [16.820873906787906]
In multi-agent reinforcement learning, multiple agents learn simultaneously while interacting with a common environment and each other.
We propose a novel textitTime Dynamical Opponent Model (TDOM) to encode the knowledge that the opponent policies tend to improve over time.
We show empirically that TDOM achieves superior opponent behavior prediction during test time.
arXiv Detail & Related papers (2022-04-12T07:16:15Z) - On the Use and Misuse of Absorbing States in Multi-agent Reinforcement
Learning [55.95253619768565]
Current MARL algorithms assume that the number of agents within a group remains fixed throughout an experiment.
In many practical problems, an agent may terminate before their teammates.
We present a novel architecture for an existing state-of-the-art MARL algorithm which uses attention instead of a fully connected layer with absorbing states.
arXiv Detail & Related papers (2021-11-10T23:45:08Z) - Locality Matters: A Scalable Value Decomposition Approach for
Cooperative Multi-Agent Reinforcement Learning [52.7873574425376]
Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents.
We propose a novel, value-based multi-agent algorithm called LOMAQ, which incorporates local rewards in the Training Decentralized Execution paradigm.
arXiv Detail & Related papers (2021-09-22T10:08:15Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z) - UneVEn: Universal Value Exploration for Multi-Agent Reinforcement
Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn)
UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features.
Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.