UneVEn: Universal Value Exploration for Multi-Agent Reinforcement
Learning
- URL: http://arxiv.org/abs/2010.02974v3
- Date: Thu, 10 Jun 2021 17:48:48 GMT
- Title: UneVEn: Universal Value Exploration for Multi-Agent Reinforcement
Learning
- Authors: Tarun Gupta, Anuj Mahajan, Bei Peng, Wendelin B\"ohmer, Shimon
Whiteson
- Abstract summary: We propose a novel MARL approach called Universal Value Exploration (UneVEn)
UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features.
Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
- Score: 53.73686229912562
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: VDN and QMIX are two popular value-based algorithms for cooperative MARL that
learn a centralized action value function as a monotonic mixing of per-agent
utilities. While this enables easy decentralization of the learned policy, the
restricted joint action value function can prevent them from solving tasks that
require significant coordination between agents at a given timestep. We show
that this problem can be overcome by improving the joint exploration of all
agents during training. Specifically, we propose a novel MARL approach called
Universal Value Exploration (UneVEn) that learns a set of related tasks
simultaneously with a linear decomposition of universal successor features.
With the policies of already solved related tasks, the joint exploration
process of all agents can be improved to help them achieve better coordination.
Empirical results on a set of exploration games, challenging cooperative
predator-prey tasks requiring significant coordination among agents, and
StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where
other state-of-the-art MARL methods fail.
Related papers
- Joint Intrinsic Motivation for Coordinated Exploration in Multi-Agent
Deep Reinforcement Learning [0.0]
We propose an approach for rewarding strategies where agents collectively exhibit novel behaviors.
Jim rewards joint trajectories based on a centralized measure of novelty designed to function in continuous environments.
Results show that joint exploration is crucial for solving tasks where the optimal strategy requires a high level of coordination.
arXiv Detail & Related papers (2024-02-06T13:02:00Z) - Learning Reward Machines in Cooperative Multi-Agent Tasks [75.79805204646428]
This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL)
It combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks.
The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments.
arXiv Detail & Related papers (2023-03-24T15:12:28Z) - Adaptive Value Decomposition with Greedy Marginal Contribution
Computation for Cooperative Multi-Agent Reinforcement Learning [48.41925886860991]
Real-world cooperation often requires intensive coordination among agents simultaneously.
Traditional methods that learn the value function as a monotonic mixing of per-agent utilities cannot solve the tasks with non-monotonic returns.
We propose a novel explicit credit assignment method to address the non-monotonic problem.
arXiv Detail & Related papers (2023-02-14T07:23:59Z) - Self-Motivated Multi-Agent Exploration [38.55811936029999]
In cooperative multi-agent reinforcement learning (CMARL), it is critical for agents to achieve a balance between self-exploration and team collaboration.
Recent works mainly concentrate on agents' coordinated exploration, which brings about the exponentially grown exploration of the state space.
We propose Self-Motivated Multi-Agent Exploration (SMMAE), which aims to achieve success in team tasks by adaptively finding a trade-off between self-exploration and team cooperation.
arXiv Detail & Related papers (2023-01-05T14:42:39Z) - CURO: Curriculum Learning for Relative Overgeneralization [6.573807158449973]
Relative overgeneralization (RO) is a pathology that can arise in cooperative multi-agent tasks.
We propose a novel approach called curriculum learning for relative overgeneralization (CURO)
arXiv Detail & Related papers (2022-12-06T03:41:08Z) - Multi-agent Deep Covering Skill Discovery [50.812414209206054]
We propose Multi-agent Deep Covering Option Discovery, which constructs the multi-agent options through minimizing the expected cover time of the multiple agents' joint state space.
Also, we propose a novel framework to adopt the multi-agent options in the MARL process.
We show that the proposed algorithm can effectively capture the agent interactions with the attention mechanism, successfully identify multi-agent options, and significantly outperforms prior works using single-agent options or no options.
arXiv Detail & Related papers (2022-10-07T00:40:59Z) - LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent
Reinforcement Learning [122.47938710284784]
We propose a novel framework for learning dynamic subtask assignment (LDSA) in cooperative MARL.
To reasonably assign agents to different subtasks, we propose an ability-based subtask selection strategy.
We show that LDSA learns reasonable and effective subtask assignment for better collaboration.
arXiv Detail & Related papers (2022-05-05T10:46:16Z) - Cooperative Exploration for Multi-Agent Deep Reinforcement Learning [127.4746863307944]
We propose cooperative multi-agent exploration (CMAE) for deep reinforcement learning.
The goal is selected from multiple projected state spaces via a normalized entropy-based technique.
We demonstrate that CMAE consistently outperforms baselines on various tasks.
arXiv Detail & Related papers (2021-07-23T20:06:32Z) - Modeling the Interaction between Agents in Cooperative Multi-Agent
Reinforcement Learning [2.9360071145551068]
We propose a novel cooperative MARL algorithm named as interactive actor-critic(IAC)
IAC models the interaction of agents from perspectives of policy and value function.
We extend the value decomposition methods to continuous control tasks and evaluate IAC on benchmark tasks including classic control and multi-agent particle environments.
arXiv Detail & Related papers (2021-02-10T01:58:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.