Mastering Multi-Drone Volleyball through Hierarchical Co-Self-Play Reinforcement Learning
- URL: http://arxiv.org/abs/2505.04317v3
- Date: Tue, 08 Jul 2025 13:24:58 GMT
- Title: Mastering Multi-Drone Volleyball through Hierarchical Co-Self-Play Reinforcement Learning
- Authors: Ruize Zhang, Sirui Xiang, Zelai Xu, Feng Gao, Shilong Ji, Wenhao Tang, Wenbo Ding, Chao Yu, Yu Wang,
- Abstract summary: We tackle the problem of learning to play 3v3 multi-drone volleyball.<n>The task requires both high-level strategic coordination and low-level agile control.<n>We propose Hierarchical Co-Self-Play, a hierarchical reinforcement learning framework.
- Score: 13.062481157503495
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we tackle the problem of learning to play 3v3 multi-drone volleyball, a new embodied competitive task that requires both high-level strategic coordination and low-level agile control. The task is turn-based, multi-agent, and physically grounded, posing significant challenges due to its long-horizon dependencies, tight inter-agent coupling, and the underactuated dynamics of quadrotors. To address this, we propose Hierarchical Co-Self-Play (HCSP), a hierarchical reinforcement learning framework that separates centralized high-level strategic decision-making from decentralized low-level motion control. We design a three-stage population-based training pipeline to enable both strategy and skill to emerge from scratch without expert demonstrations: (I) training diverse low-level skills, (II) learning high-level strategy via self-play with fixed low-level controllers, and (III) joint fine-tuning through co-self-play. Experiments show that HCSP achieves superior performance, outperforming non-hierarchical self-play and rule-based hierarchical baselines with an average 82.9% win rate and a 71.5% win rate against the two-stage variant. Moreover, co-self-play leads to emergent team behaviors such as role switching and coordinated formations, demonstrating the effectiveness of our hierarchical design and training scheme. The project page is at https://sites.google.com/view/hi-co-self-play.
Related papers
- Toward Real-World Cooperative and Competitive Soccer with Quadrupedal Robot Teams [25.885025521590414]
We present a hierarchical multi-agent reinforcement learning (MARL) framework that enables fully autonomous and decentralized quadruped robot soccer.<n>First, a set of highly dynamic low-level skills is trained for legged locomotion and ball manipulation, such as walking, dribbling, and kicking.<n>On top of these, a high-level strategic planning policy is trained with Multi-Agent Proximal Policy Optimization (MAPPO) via Fictitious Self-Play.
arXiv Detail & Related papers (2025-05-20T02:20:54Z) - VolleyBots: A Testbed for Multi-Drone Volleyball Game Combining Motion Control and Strategic Play [26.51145484716405]
We present VolleyBots, a novel robot sports testbed where multiple drones cooperate and compete in the sport of volleyball under physical dynamics.<n>VolleyBots integrates three features within a unified platform: competitive and cooperative gameplay, turn-based interaction structure, and agile 3D maneuvering.
arXiv Detail & Related papers (2025-02-04T02:07:23Z) - A Hierarchical Reinforcement Learning Framework for Multi-UAV Combat Using Leader-Follower Strategy [3.095786524987445]
Multi-UAV air combat is a complex task involving multiple autonomous UAVs.<n>Previous approaches predominantly discretize the action space into predefined actions.<n>We propose a hierarchical framework utilizing the Leader-Follower Multi-Agent Proximal Policy Optimization strategy.
arXiv Detail & Related papers (2025-01-22T02:41:36Z) - Multi-Agent Training for Pommerman: Curriculum Learning and Population-based Self-Play Approach [11.740631954398292]
Pommerman is an ideal benchmark for multi-agent training, providing a battleground for two teams with communication capabilities among allied agents.<n>This study introduces a system designed to train multi-agent systems to play Pommerman using a combination of curriculum learning and population-based self-play.
arXiv Detail & Related papers (2024-06-30T11:14:29Z) - Mimicking To Dominate: Imitation Learning Strategies for Success in
Multiagent Competitive Games [13.060023718506917]
We develop a new multi-agent imitation learning model for predicting next moves of the opponents.
We also present a new multi-agent reinforcement learning algorithm that combines our imitation learning model and policy training into one single training process.
Experimental results show that our approach achieves superior performance compared to existing state-of-the-art multi-agent RL algorithms.
arXiv Detail & Related papers (2023-08-20T07:30:13Z) - Accelerating Self-Supervised Learning via Efficient Training Strategies [98.26556609110992]
Time for training self-supervised deep networks remains an order of magnitude larger than its supervised counterparts.
Motivated by these issues, this paper investigates reducing the training time of recent self-supervised methods.
arXiv Detail & Related papers (2022-12-11T21:49:39Z) - Hierarchical Reinforcement Learning with Opponent Modeling for
Distributed Multi-agent Cooperation [13.670618752160594]
Deep reinforcement learning (DRL) provides a promising approach for multi-agent cooperation through the interaction of the agents and environments.
Traditional DRL solutions suffer from the high dimensions of multiple agents with continuous action space during policy search.
We propose a hierarchical reinforcement learning approach with high-level decision-making and low-level individual control for efficient policy search.
arXiv Detail & Related papers (2022-06-25T19:09:29Z) - Learning to Transfer Role Assignment Across Team Sizes [48.43860606706273]
We propose a framework to learn role assignment and transfer across team sizes.
We demonstrate that re-using the role-based credit assignment structure can foster the learning process of larger reinforcement learning teams.
arXiv Detail & Related papers (2022-04-17T11:22:01Z) - Coach-assisted Multi-Agent Reinforcement Learning Framework for
Unexpected Crashed Agents [120.91291581594773]
We present a formal formulation of a cooperative multi-agent reinforcement learning system with unexpected crashes.
We propose a coach-assisted multi-agent reinforcement learning framework, which introduces a virtual coach agent to adjust the crash rate during training.
To the best of our knowledge, this work is the first to study the unexpected crashes in the multi-agent system.
arXiv Detail & Related papers (2022-03-16T08:22:45Z) - It Takes Four to Tango: Multiagent Selfplay for Automatic Curriculum
Generation [107.10235120286352]
Training general-purpose reinforcement learning agents efficiently requires automatic generation of a goal curriculum.
We propose Curriculum Self Play (CuSP), an automated goal generation framework.
We demonstrate that our method succeeds at generating an effective curricula of goals for a range of control tasks.
arXiv Detail & Related papers (2022-02-22T01:23:23Z) - JueWu-MC: Playing Minecraft with Sample-efficient Hierarchical
Reinforcement Learning [13.57305458734617]
We propose JueWu-MC, a sample-efficient hierarchical RL approach equipped with representation learning and imitation learning to deal with perception and exploration.
Specifically, our approach includes two levels of hierarchy, where the high-level controller learns a policy to control over options and the low-level workers learn to solve each sub-task.
To boost the learning of sub-tasks, we propose a combination of techniques including 1) action-aware representation learning which captures underlying relations between action and representation, 2) discriminator-based self-imitation learning for efficient exploration, and 3) ensemble behavior cloning with consistency filtering for
arXiv Detail & Related papers (2021-12-07T09:24:49Z) - HAVEN: Hierarchical Cooperative Multi-Agent Reinforcement Learning with
Dual Coordination Mechanism [17.993973801986677]
Multi-agent reinforcement learning often suffers from the exponentially larger action space caused by a large number of agents.
We propose a novel value decomposition framework HAVEN based on hierarchical reinforcement learning for the fully cooperative multi-agent problems.
arXiv Detail & Related papers (2021-10-14T10:43:47Z) - MALib: A Parallel Framework for Population-based Multi-agent
Reinforcement Learning [61.28547338576706]
Population-based multi-agent reinforcement learning (PB-MARL) refers to the series of methods nested with reinforcement learning (RL) algorithms.
We present MALib, a scalable and efficient computing framework for PB-MARL.
arXiv Detail & Related papers (2021-06-05T03:27:08Z) - Coach-Player Multi-Agent Reinforcement Learning for Dynamic Team
Composition [88.26752130107259]
In real-world multiagent systems, agents with different capabilities may join or leave without altering the team's overarching goals.
We propose COPA, a coach-player framework to tackle this problem.
We 1) adopt the attention mechanism for both the coach and the players; 2) propose a variational objective to regularize learning; and 3) design an adaptive communication method to let the coach decide when to communicate with the players.
arXiv Detail & Related papers (2021-05-18T17:27:37Z) - Learning Agile Locomotion via Adversarial Training [59.03007947334165]
In this paper, we present a multi-agent learning system, in which a quadruped robot (protagonist) learns to chase another robot (adversary) while the latter learns to escape.
We find that this adversarial training process not only encourages agile behaviors but also effectively alleviates the laborious environment design effort.
In contrast to prior works that used only one adversary, we find that training an ensemble of adversaries, each of which specializes in a different escaping strategy, is essential for the protagonist to master agility.
arXiv Detail & Related papers (2020-08-03T01:20:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.