Towards Skilled Population Curriculum for Multi-Agent Reinforcement
Learning
- URL: http://arxiv.org/abs/2302.03429v1
- Date: Tue, 7 Feb 2023 12:30:52 GMT
- Title: Towards Skilled Population Curriculum for Multi-Agent Reinforcement
Learning
- Authors: Rundong Wang, Longtao Zheng, Wei Qiu, Bowei He, Bo An, Zinovi
Rabinovich, Yujing Hu, Yingfeng Chen, Tangjie Lv, Changjie Fan
- Abstract summary: We introduce a novel automatic curriculum learning framework, Skilled Population Curriculum (SPC), which adapts curriculum learning to multi-agent coordination.
Specifically, we endow the student with population-invariant communication and a hierarchical skill set, allowing it to learn cooperation and behavior skills from distinct tasks with varying numbers of agents.
We also analyze the inherent non-stationarity of this multi-agent automatic curriculum teaching problem and provide a corresponding regret bound.
- Score: 42.540853953923495
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in multi-agent reinforcement learning (MARL) allow agents to
coordinate their behaviors in complex environments. However, common MARL
algorithms still suffer from scalability and sparse reward issues. One
promising approach to resolving them is automatic curriculum learning (ACL).
ACL involves a student (curriculum learner) training on tasks of increasing
difficulty controlled by a teacher (curriculum generator). Despite its success,
ACL's applicability is limited by (1) the lack of a general student framework
for dealing with the varying number of agents across tasks and the sparse
reward problem, and (2) the non-stationarity of the teacher's task due to
ever-changing student strategies. As a remedy for ACL, we introduce a novel
automatic curriculum learning framework, Skilled Population Curriculum (SPC),
which adapts curriculum learning to multi-agent coordination. Specifically, we
endow the student with population-invariant communication and a hierarchical
skill set, allowing it to learn cooperation and behavior skills from distinct
tasks with varying numbers of agents. In addition, we model the teacher as a
contextual bandit conditioned by student policies, enabling a team of agents to
change its size while still retaining previously acquired skills. We also
analyze the inherent non-stationarity of this multi-agent automatic curriculum
teaching problem and provide a corresponding regret bound. Empirical results
show that our method improves the performance, scalability and sample
efficiency in several MARL environments.
Related papers
- Multi-agent cooperation through learning-aware policy gradients [53.63948041506278]
Self-interested individuals often fail to cooperate, posing a fundamental challenge for multi-agent learning.
We present the first unbiased, higher-derivative-free policy gradient algorithm for learning-aware reinforcement learning.
We derive from the iterated prisoner's dilemma a novel explanation for how and when cooperation arises among self-interested learning-aware agents.
arXiv Detail & Related papers (2024-10-24T10:48:42Z) - Enabling Multi-Agent Transfer Reinforcement Learning via Scenario
Independent Representation [0.7366405857677227]
Multi-Agent Reinforcement Learning (MARL) algorithms are widely adopted in tackling complex tasks that require collaboration and competition among agents.
We introduce a novel framework that enables transfer learning for MARL through unifying various state spaces into fixed-size inputs.
We show significant enhancements in multi-agent learning performance using maneuvering skills learned from other scenarios compared to agents learning from scratch.
arXiv Detail & Related papers (2024-02-13T02:48:18Z) - Towards Few-shot Coordination: Revisiting Ad-hoc Teamplay Challenge In
the Game of Hanabi [15.917861586043813]
We show that state-of-the-art ZSC algorithms have poor performance when paired with agents trained with different learning methods.
We create a framework based on a popular cooperative multi-agent game called Hanabi to evaluate the adaptability of MARL methods.
arXiv Detail & Related papers (2023-08-20T14:44:50Z) - Learning Reward Machines in Cooperative Multi-Agent Tasks [75.79805204646428]
This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL)
It combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks.
The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments.
arXiv Detail & Related papers (2023-03-24T15:12:28Z) - LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent
Reinforcement Learning [122.47938710284784]
We propose a novel framework for learning dynamic subtask assignment (LDSA) in cooperative MARL.
To reasonably assign agents to different subtasks, we propose an ability-based subtask selection strategy.
We show that LDSA learns reasonable and effective subtask assignment for better collaboration.
arXiv Detail & Related papers (2022-05-05T10:46:16Z) - It Takes Four to Tango: Multiagent Selfplay for Automatic Curriculum
Generation [107.10235120286352]
Training general-purpose reinforcement learning agents efficiently requires automatic generation of a goal curriculum.
We propose Curriculum Self Play (CuSP), an automated goal generation framework.
We demonstrate that our method succeeds at generating an effective curricula of goals for a range of control tasks.
arXiv Detail & Related papers (2022-02-22T01:23:23Z) - Variational Automatic Curriculum Learning for Sparse-Reward Cooperative
Multi-Agent Problems [42.973910399533054]
We introduce a curriculum learning algorithm, Variational Automatic Curriculum Learning (VACL), for solving cooperative multi-agent reinforcement learning problems.
Our VACL algorithm implements this variational paradigm with two practical components, task expansion and entity progression.
Experiment results show that VACL solves a collection of sparse-reward problems with a large number of agents.
arXiv Detail & Related papers (2021-11-08T16:35:08Z) - SS-MAIL: Self-Supervised Multi-Agent Imitation Learning [18.283839252425803]
Two families of algorithms - Behavioral Cloning (BC) and Adversarial Imitation Learning (AIL)
BC approaches suffer from compounding errors, as they ignore the sequential decision-making nature of the trajectory generation problem.
AIL methods are plagued with instability in their training dynamics.
We introduce a novel self-supervised loss that encourages the discriminator to approximate a richer reward function.
arXiv Detail & Related papers (2021-10-18T01:17:50Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z) - Mutual Information Based Knowledge Transfer Under State-Action Dimension
Mismatch [14.334987432342707]
We propose a new framework for transfer learning where the teacher and the student can have arbitrarily different state- and action-spaces.
To handle this mismatch, we produce embeddings which can systematically extract knowledge from the teacher policy and value networks.
We demonstrate successful transfer learning in situations when the teacher and student have different state- and action-spaces.
arXiv Detail & Related papers (2020-06-12T09:51:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.