Bidirectional Task-Motion Planning Based on Hierarchical Reinforcement Learning for Strategic Confrontation
- URL: http://arxiv.org/abs/2504.15876v2
- Date: Wed, 23 Apr 2025 15:00:10 GMT
- Title: Bidirectional Task-Motion Planning Based on Hierarchical Reinforcement Learning for Strategic Confrontation
- Authors: Qizhen Wu, Lei Chen, Kexin Liu, Jinhu Lü,
- Abstract summary: In swarm robotics, confrontation scenarios, including strategic confrontations, require efficient decision-making.<n>Traditional task and motion planning methods separate decision-making into two layers, but their unidirectional structure fails to capture the interdependence between these layers.<n>Here, we propose a novel bidirectional approach based on hierarchical reinforcement learning, enabling dynamic interaction between the layers.
- Score: 12.122881147337505
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In swarm robotics, confrontation scenarios, including strategic confrontations, require efficient decision-making that integrates discrete commands and continuous actions. Traditional task and motion planning methods separate decision-making into two layers, but their unidirectional structure fails to capture the interdependence between these layers, limiting adaptability in dynamic environments. Here, we propose a novel bidirectional approach based on hierarchical reinforcement learning, enabling dynamic interaction between the layers. This method effectively maps commands to task allocation and actions to path planning, while leveraging cross-training techniques to enhance learning across the hierarchical framework. Furthermore, we introduce a trajectory prediction model that bridges abstract task representations with actionable planning goals. In our experiments, it achieves over 80% in confrontation win rate and under 0.01 seconds in decision time, outperforming existing approaches. Demonstrations through large-scale tests and real-world robot experiments further emphasize the generalization capabilities and practical applicability of our method.
Related papers
- Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning [51.52387511006586]
We propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm.
HOP is hierarchically composed of two modules: an opponent modeling module that infers others' goals and learns corresponding goal-conditioned policies.
HOP exhibits superior few-shot adaptation capabilities when interacting with various unseen agents, and excels in self-play scenarios.
arXiv Detail & Related papers (2024-06-12T08:48:06Z) - Hierarchical Reinforcement Learning for Swarm Confrontation with High Uncertainty [12.122881147337505]
High uncertainty caused by unknown opponents' strategies, dynamic obstacles, and insufficient training complicates the action space into a hybrid decision process.
We propose a novel hierarchical reinforcement learning approach consisting of a target allocation layer, a path planning layer, and the underlying dynamic interaction mechanism.
To overcome the unstable training process introduced by the two layers, we design an integration training method including pre-training and cross-training.
arXiv Detail & Related papers (2024-06-12T05:12:10Z) - Multi-Agent Transfer Learning via Temporal Contrastive Learning [8.487274986507922]
This paper introduces a novel transfer learning framework for deep multi-agent reinforcement learning.
The approach automatically combines goal-conditioned policies with temporal contrastive learning to discover meaningful sub-goals.
arXiv Detail & Related papers (2024-06-03T14:42:14Z) - Deep hybrid models: infer and plan in a dynamic world [0.0]
We present a solution, based on active inference, for complex control tasks.<n>The proposed architecture exploits hybrid (discrete and continuous) processing.<n>We show that the model can tackle the presented task under different conditions.
arXiv Detail & Related papers (2024-02-01T15:15:25Z) - Simple Hierarchical Planning with Diffusion [54.48129192534653]
Diffusion-based generative methods have proven effective in modeling trajectories with offline datasets.
We introduce the Hierarchical diffuser, a fast, yet surprisingly effective planning method combining the advantages of hierarchical and diffusion-based planning.
Our model adopts a "jumpy" planning strategy at the higher level, which allows it to have a larger receptive field but at a lower computational cost.
arXiv Detail & Related papers (2024-01-05T05:28:40Z) - Chain-of-Thought Predictive Control [32.30974063877643]
We study generalizable policy learning from demonstrations for complex low-level control.
We propose a novel hierarchical imitation learning method that utilizes sub-optimal demos.
arXiv Detail & Related papers (2023-04-03T07:59:13Z) - Silver-Bullet-3D at ManiSkill 2021: Learning-from-Demonstrations and
Heuristic Rule-based Methods for Object Manipulation [118.27432851053335]
This paper presents an overview and comparative analysis of our systems designed for the following two tracks in SAPIEN ManiSkill Challenge 2021: No Interaction Track.
The No Interaction track targets for learning policies from pre-collected demonstration trajectories.
In this track, we design a Heuristic Rule-based Method (HRM) to trigger high-quality object manipulation by decomposing the task into a series of sub-tasks.
For each sub-task, the simple rule-based controlling strategies are adopted to predict actions that can be applied to robotic arms.
arXiv Detail & Related papers (2022-06-13T16:20:42Z) - Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in
Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments.
To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command.
We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z) - Multi-lane Cruising Using Hierarchical Planning and Reinforcement
Learning [3.7438459768783794]
Multi-lane cruising requires using lane changes and within-lane maneuvers to achieve good speed and maintain safety.
This paper proposes a design for autonomous multi-lane cruising by combining a hierarchical reinforcement learning framework with a novel state-action space abstraction.
arXiv Detail & Related papers (2021-10-01T21:03:39Z) - Language-guided Navigation via Cross-Modal Grounding and Alternate
Adversarial Learning [66.9937776799536]
The emerging vision-and-language navigation (VLN) problem aims at learning to navigate an agent to the target location in unseen photo-realistic environments.
The main challenges of VLN arise mainly from two aspects: first, the agent needs to attend to the meaningful paragraphs of the language instruction corresponding to the dynamically-varying visual environments.
We propose a cross-modal grounding module to equip the agent with a better ability to track the correspondence between the textual and visual modalities.
arXiv Detail & Related papers (2020-11-22T09:13:46Z) - Behavior Priors for Efficient Reinforcement Learning [97.81587970962232]
We consider how information and architectural constraints can be combined with ideas from the probabilistic modeling literature to learn behavior priors.
We discuss how such latent variable formulations connect to related work on hierarchical reinforcement learning (HRL) and mutual information and curiosity based objectives.
We demonstrate the effectiveness of our framework by applying it to a range of simulated continuous control domains.
arXiv Detail & Related papers (2020-10-27T13:17:18Z) - Learning Functionally Decomposed Hierarchies for Continuous Control
Tasks with Path Planning [36.050432925402845]
We present HiDe, a novel hierarchical reinforcement learning architecture that successfully solves long horizon control tasks.
We experimentally show that our method generalizes across unseen test environments and can scale to 3x horizon length compared to both learning and non-learning based methods.
arXiv Detail & Related papers (2020-02-14T10:19:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.