Skill Discovery of Coordination in Multi-agent Reinforcement Learning
- URL: http://arxiv.org/abs/2006.04021v1
- Date: Sun, 7 Jun 2020 02:04:15 GMT
- Title: Skill Discovery of Coordination in Multi-agent Reinforcement Learning
- Authors: Shuncheng He, Jianzhun Shao, Xiangyang Ji
- Abstract summary: We propose "Multi-agent Skill Discovery"(MASD), a method for discovering skills for coordination patterns of multiple agents.
We show the emergence of various skills on the level of coordination in a general particle multi-agent environment.
We also reveal that the "bottleneck" prevents skills from collapsing to a single agent and enhances the diversity of learned skills.
- Score: 41.67943127631515
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised skill discovery drives intelligent agents to explore the unknown
environment without task-specific reward signal, and the agents acquire various
skills which may be useful when the agents adapt to new tasks. In this paper,
we propose "Multi-agent Skill Discovery"(MASD), a method for discovering skills
for coordination patterns of multiple agents. The proposed method aims to
maximize the mutual information between a latent code Z representing skills and
the combination of the states of all agents. Meanwhile it suppresses the
empowerment of Z on the state of any single agent by adversarial training. In
another word, it sets an information bottleneck to avoid empowerment
degeneracy. First we show the emergence of various skills on the level of
coordination in a general particle multi-agent environment. Second, we reveal
that the "bottleneck" prevents skills from collapsing to a single agent and
enhances the diversity of learned skills. Finally, we show the pretrained
policies have better performance on supervised RL tasks.
Related papers
- ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward [29.737986509769808]
We propose a self-supervised intrinsic reward ELIGN - expectation alignment.
Similar to how animals collaborate in a decentralized manner with those in their vicinity, agents trained with expectation alignment learn behaviors that match their neighbors' expectations.
We show that agent coordination improves through expectation alignment because agents learn to divide tasks amongst themselves, break coordination symmetries, and confuse adversaries.
arXiv Detail & Related papers (2022-10-09T22:24:44Z) - Multi-agent Deep Covering Skill Discovery [50.812414209206054]
We propose Multi-agent Deep Covering Option Discovery, which constructs the multi-agent options through minimizing the expected cover time of the multiple agents' joint state space.
Also, we propose a novel framework to adopt the multi-agent options in the MARL process.
We show that the proposed algorithm can effectively capture the agent interactions with the attention mechanism, successfully identify multi-agent options, and significantly outperforms prior works using single-agent options or no options.
arXiv Detail & Related papers (2022-10-07T00:40:59Z) - LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent
Reinforcement Learning [122.47938710284784]
We propose a novel framework for learning dynamic subtask assignment (LDSA) in cooperative MARL.
To reasonably assign agents to different subtasks, we propose an ability-based subtask selection strategy.
We show that LDSA learns reasonable and effective subtask assignment for better collaboration.
arXiv Detail & Related papers (2022-05-05T10:46:16Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - Human-Inspired Multi-Agent Navigation using Knowledge Distillation [4.659427498118277]
We propose a framework for learning a human-like general collision avoidance policy for agent-agent interactions.
Our approach uses knowledge distillation with reinforcement learning to shape the reward function.
We show that agents trained with our approach can take human-like trajectories in collision avoidance and goal-directed steering tasks.
arXiv Detail & Related papers (2021-03-18T03:24:38Z) - UneVEn: Universal Value Exploration for Multi-Agent Reinforcement
Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn)
UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features.
Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z) - Learning to Incentivize Other Learning Agents [73.03133692589532]
We show how to equip RL agents with the ability to give rewards directly to other agents, using a learned incentive function.
Such agents significantly outperform standard RL and opponent-shaping agents in challenging general-sum Markov games.
Our work points toward more opportunities and challenges along the path to ensure the common good in a multi-agent future.
arXiv Detail & Related papers (2020-06-10T20:12:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.