Hierarchical Reinforcement Learning for Swarm Confrontation with High Uncertainty
- URL: http://arxiv.org/abs/2406.07877v2
- Date: Fri, 25 Oct 2024 08:35:19 GMT
- Title: Hierarchical Reinforcement Learning for Swarm Confrontation with High Uncertainty
- Authors: Qizhen Wu, Kexin Liu, Lei Chen, Jinhu Lü,
- Abstract summary: High uncertainty caused by unknown opponents' strategies, dynamic obstacles, and insufficient training complicates the action space into a hybrid decision process.
We propose a novel hierarchical reinforcement learning approach consisting of a target allocation layer, a path planning layer, and the underlying dynamic interaction mechanism.
To overcome the unstable training process introduced by the two layers, we design an integration training method including pre-training and cross-training.
- Score: 12.122881147337505
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In swarm robotics, confrontation including the pursuit-evasion game is a key scenario. High uncertainty caused by unknown opponents' strategies, dynamic obstacles, and insufficient training complicates the action space into a hybrid decision process. Although the deep reinforcement learning method is significant for swarm confrontation since it can handle various sizes, as an end-to-end implementation, it cannot deal with the hybrid process. Here, we propose a novel hierarchical reinforcement learning approach consisting of a target allocation layer, a path planning layer, and the underlying dynamic interaction mechanism between the two layers, which indicates the quantified uncertainty. It decouples the hybrid process into discrete allocation and continuous planning layers, with a probabilistic ensemble model to quantify the uncertainty and regulate the interaction frequency adaptively. Furthermore, to overcome the unstable training process introduced by the two layers, we design an integration training method including pre-training and cross-training, which enhances the training efficiency and stability. Experiment results in both comparison, ablation, and real-robot studies validate the effectiveness and generalization performance of our proposed approach. In our defined experiments with twenty to forty agents, the win rate of the proposed method reaches around ninety percent, outperforming other traditional methods.
Related papers
- Bidirectional Task-Motion Planning Based on Hierarchical Reinforcement Learning for Strategic Confrontation [12.122881147337505]
In swarm robotics, confrontation scenarios, including strategic confrontations, require efficient decision-making.
Traditional task and motion planning methods separate decision-making into two layers, but their unidirectional structure fails to capture the interdependence between these layers.
Here, we propose a novel bidirectional approach based on hierarchical reinforcement learning, enabling dynamic interaction between the layers.
arXiv Detail & Related papers (2025-04-22T13:22:58Z) - Two-Step Offline Preference-Based Reinforcement Learning with Constrained Actions [38.48223545539604]
We develop a novel two-step learning method called PRC: preference-based reinforcement learning with constrained actions.
We empirically verify that our method has high learning efficiency on various datasets in robotic control environments.
arXiv Detail & Related papers (2023-12-30T21:37:18Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Safe Multi-agent Learning via Trapping Regions [89.24858306636816]
We apply the concept of trapping regions, known from qualitative theory of dynamical systems, to create safety sets in the joint strategy space for decentralized learning.
We propose a binary partitioning algorithm for verification that candidate sets form trapping regions in systems with known learning dynamics, and a sampling algorithm for scenarios where learning dynamics are not known.
arXiv Detail & Related papers (2023-02-27T14:47:52Z) - Expeditious Saliency-guided Mix-up through Random Gradient Thresholding [89.59134648542042]
Mix-up training approaches have proven to be effective in improving the generalization ability of Deep Neural Networks.
In this paper, inspired by the superior qualities of each direction over one another, we introduce a novel method that lies at the junction of the two routes.
We name our method R-Mix following the concept of "Random Mix-up"
In order to address the question of whether there exists a better decision protocol, we train a Reinforcement Learning agent that decides the mix-up policies.
arXiv Detail & Related papers (2022-12-09T14:29:57Z) - Guaranteed Conservation of Momentum for Learning Particle-based Fluid
Dynamics [96.9177297872723]
We present a novel method for guaranteeing linear momentum in learned physics simulations.
We enforce conservation of momentum with a hard constraint, which we realize via antisymmetrical continuous convolutional layers.
In combination, the proposed method allows us to increase the physical accuracy of the learned simulator substantially.
arXiv Detail & Related papers (2022-10-12T09:12:59Z) - Decorrelative Network Architecture for Robust Electrocardiogram
Classification [4.808817930937323]
It is not possible to train networks that are accurate in all scenarios.
Deep learning methods sample the model parameter space to estimate uncertainty.
These parameters are often subject to the same vulnerabilities, which can be exploited by adversarial attacks.
We propose a novel ensemble approach based on feature decorrelation and Fourier partitioning for teaching networks diverse complementary features.
arXiv Detail & Related papers (2022-07-19T02:36:36Z) - Enhancing Adversarial Training with Feature Separability [52.39305978984573]
We introduce a new concept of adversarial training graph (ATG) with which the proposed adversarial training with feature separability (ATFS) enables to boost the intra-class feature similarity and increase inter-class feature variance.
Through comprehensive experiments, we demonstrate that the proposed ATFS framework significantly improves both clean and robust performance.
arXiv Detail & Related papers (2022-05-02T04:04:23Z) - Self-Progressing Robust Training [146.8337017922058]
Current robust training methods such as adversarial training explicitly uses an "attack" to generate adversarial examples.
We propose a new framework called SPROUT, self-progressing robust training.
Our results shed new light on scalable, effective and attack-independent robust training methods.
arXiv Detail & Related papers (2020-12-22T00:45:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.