Diffusion-Reinforcement Learning Hierarchical Motion Planning in Multi-agent Adversarial Games
- URL: http://arxiv.org/abs/2403.10794v2
- Date: Thu, 08 May 2025 21:52:16 GMT
- Title: Diffusion-Reinforcement Learning Hierarchical Motion Planning in Multi-agent Adversarial Games
- Authors: Zixuan Wu, Sean Ye, Manisha Natarajan, Matthew C. Gombolay,
- Abstract summary: We propose a hierarchical architecture that integrates a high-level diffusion model to plan global paths responsive to environment data.<n>We show that our approach outperforms baselines by 77.18% and 47.38% on detection and goal reaching rate.
- Score: 6.532258098619471
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement Learning (RL)-based motion planning has recently shown the potential to outperform traditional approaches from autonomous navigation to robot manipulation. In this work, we focus on a motion planning task for an evasive target in a partially observable multi-agent adversarial pursuit-evasion game (PEG). Pursuit-evasion problems are relevant to various applications, such as search and rescue operations and surveillance robots, where robots must effectively plan their actions to gather intelligence or accomplish mission tasks while avoiding detection or capture. We propose a hierarchical architecture that integrates a high-level diffusion model to plan global paths responsive to environment data, while a low-level RL policy reasons about evasive versus global path-following behavior. The benchmark results across different domains and different observability show that our approach outperforms baselines by 77.18% and 47.38% on detection and goal reaching rate, which leads to 51.4% increasing of the performance score on average. Additionally, our method improves interpretability, flexibility and efficiency of the learned policy.
Related papers
- Action Flow Matching for Continual Robot Learning [57.698553219660376]
Continual learning in robotics seeks systems that can constantly adapt to changing environments and tasks.
We introduce a generative framework leveraging flow matching for online robot dynamics model alignment.
We find that by transforming the actions themselves rather than exploring with a misaligned model, the robot collects informative data more efficiently.
arXiv Detail & Related papers (2025-04-25T16:26:15Z) - Latent Diffusion Planning for Imitation Learning [78.56207566743154]
Latent Diffusion Planning (LDP) is a modular approach consisting of a planner and inverse dynamics model.
By separating planning from action prediction, LDP can benefit from the denser supervision signals of suboptimal and action-free data.
On simulated visual robotic manipulation tasks, LDP outperforms state-of-the-art imitation learning approaches.
arXiv Detail & Related papers (2025-04-23T17:53:34Z) - Dynamic Path Navigation for Motion Agents with LLM Reasoning [69.5875073447454]
Large Language Models (LLMs) have demonstrated strong generalizable reasoning and planning capabilities.
We explore the zero-shot navigation and path generation capabilities of LLMs by constructing a dataset and proposing an evaluation protocol.
We demonstrate that, when tasks are well-structured in this manner, modern LLMs exhibit substantial planning proficiency in avoiding obstacles while autonomously refining navigation with the generated motion to reach the target.
arXiv Detail & Related papers (2025-03-10T13:39:09Z) - A Cross-Scene Benchmark for Open-World Drone Active Tracking [54.235808061746525]
Drone Visual Active Tracking aims to autonomously follow a target object by controlling the motion system based on visual observations.<n>We propose a unified cross-scene cross-domain benchmark for open-world drone active tracking called DAT.<n>We also propose a reinforcement learning-based drone tracking method called R-VAT.
arXiv Detail & Related papers (2024-12-01T09:37:46Z) - Multi-agent Path Finding for Timed Tasks using Evolutionary Games [1.3023548510259344]
We show that our algorithm is faster than deep RL methods by at least an order of magnitude.
Our results indicate that it scales better with an increase in the number of agents as compared to other methods.
arXiv Detail & Related papers (2024-11-15T20:10:25Z) - From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.<n>We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Action abstractions for amortized sampling [49.384037138511246]
We propose an approach to incorporate the discovery of action abstractions, or high-level actions, into the policy optimization process.
Our approach involves iteratively extracting action subsequences commonly used across many high-reward trajectories and chunking' them into a single action that is added to the action space.
arXiv Detail & Related papers (2024-10-19T19:22:50Z) - LDP: A Local Diffusion Planner for Efficient Robot Navigation and Collision Avoidance [16.81917489473445]
The conditional diffusion model has been demonstrated as an efficient tool for learning robot policies.
The intricate nature of real-world scenarios, characterized by dynamic obstacles and maze-like structures, underscores the complexity of robot local navigation decision-making.
arXiv Detail & Related papers (2024-07-02T04:53:35Z) - Improving Generalization in Aerial and Terrestrial Mobile Robots Control Through Delayed Policy Learning [0.19638749905454383]
Deep Reinforcement Learning (DRL) has emerged as a promising approach to enhancing motion control and decision-making.
This paper explores the impact of the Delayed Policy Updates (DPU) technique on fostering generalization to new situations.
arXiv Detail & Related papers (2024-06-04T04:16:38Z) - Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents [49.85633804913796]
We present an exploration-based trajectory optimization approach, referred to as ETO.
This learning method is designed to enhance the performance of open LLM agents.
Our experiments on three complex tasks demonstrate that ETO consistently surpasses baseline performance by a large margin.
arXiv Detail & Related papers (2024-03-04T21:50:29Z) - Mission-driven Exploration for Accelerated Deep Reinforcement Learning
with Temporal Logic Task Specifications [11.812602599752294]
We consider robots with unknown dynamics operating in environments with unknown structure.
Our goal is to synthesize a control policy that maximizes the probability of satisfying an automaton-encoded task.
We propose a novel DRL algorithm, which has the capability to learn control policies at a notably faster rate compared to similar methods.
arXiv Detail & Related papers (2023-11-28T18:59:58Z) - Distributed multi-agent target search and tracking with Gaussian process
and reinforcement learning [26.499110405106812]
We propose a multi-agent reinforcement learning technique with target map building based on distributed process.
We evaluate the performance and transferability of the trained policy in simulation and demonstrate the method on a swarm of micro unmanned aerial vehicles.
arXiv Detail & Related papers (2023-08-29T01:53:14Z) - DC-MRTA: Decentralized Multi-Robot Task Allocation and Navigation in
Complex Environments [55.204450019073036]
We present a novel reinforcement learning based task allocation and decentralized navigation algorithm for mobile robots in warehouse environments.
We consider the problem of joint decentralized task allocation and navigation and present a two level approach to solve it.
We observe improvement up to 14% in terms of task completion time and up-to 40% improvement in terms of computing collision-free trajectories for the robots.
arXiv Detail & Related papers (2022-09-07T00:35:27Z) - SABER: Data-Driven Motion Planner for Autonomously Navigating
Heterogeneous Robots [112.2491765424719]
We present an end-to-end online motion planning framework that uses a data-driven approach to navigate a heterogeneous robot team towards a global goal.
We use model predictive control (SMPC) to calculate control inputs that satisfy robot dynamics, and consider uncertainty during obstacle avoidance with chance constraints.
recurrent neural networks are used to provide a quick estimate of future state uncertainty considered in the SMPC finite-time horizon solution.
A Deep Q-learning agent is employed to serve as a high-level path planner, providing the SMPC with target positions that move the robots towards a desired global goal.
arXiv Detail & Related papers (2021-08-03T02:56:21Z) - Learning to Plan Optimistically: Uncertainty-Guided Deep Exploration via
Latent Model Ensembles [73.15950858151594]
This paper presents Latent Optimistic Value Exploration (LOVE), a strategy that enables deep exploration through optimism in the face of uncertain long-term rewards.
We combine latent world models with value function estimation to predict infinite-horizon returns and recover associated uncertainty via ensembling.
We apply LOVE to visual robot control tasks in continuous action spaces and demonstrate on average more than 20% improved sample efficiency in comparison to state-of-the-art and other exploration objectives.
arXiv Detail & Related papers (2020-10-27T22:06:57Z) - ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for
Mobile Manipulation [99.2543521972137]
ReLMoGen is a framework that combines a learned policy to predict subgoals and a motion generator to plan and execute the motion needed to reach these subgoals.
Our method is benchmarked on a diverse set of seven robotics tasks in photo-realistic simulation environments.
ReLMoGen shows outstanding transferability between different motion generators at test time, indicating a great potential to transfer to real robots.
arXiv Detail & Related papers (2020-08-18T08:05:15Z) - MAPPER: Multi-Agent Path Planning with Evolutionary Reinforcement
Learning in Mixed Dynamic Environments [30.407700996710023]
This paper proposes a decentralized partially observable multi-agent path planning with evolutionary reinforcement learning (MAPPER) method.
We decompose the long-range navigation task into many easier sub-tasks under the guidance of a global planner.
Our approach models dynamic obstacles' behavior with an image-based representation and trains a policy in mixed dynamic environments without homogeneity assumption.
arXiv Detail & Related papers (2020-07-30T20:14:42Z) - Learning to Track Dynamic Targets in Partially Known Environments [48.49957897251128]
We use a deep reinforcement learning approach to solve active target tracking.
In particular, we introduce Active Tracking Target Network (ATTN), a unified RL policy that is capable of solving major sub-tasks of active target tracking.
arXiv Detail & Related papers (2020-06-17T22:45:24Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z) - Mobile Robot Path Planning in Dynamic Environments through Globally
Guided Reinforcement Learning [12.813442161633116]
We introduce a globally guided learning reinforcement approach (G2RL) to solve the multi-robot planning problem.
G2RL incorporates a novel path reward structure that generalizes to arbitrary environments.
We evaluate our method across different map types, obstacle densities and the number of robots.
arXiv Detail & Related papers (2020-05-11T20:42:29Z) - Model-based Reinforcement Learning for Decentralized Multiagent
Rendezvous [66.6895109554163]
Underlying the human ability to align goals with other agents is their ability to predict the intentions of others and actively update their own plans.
We propose hierarchical predictive planning (HPP), a model-based reinforcement learning method for decentralized multiagent rendezvous.
arXiv Detail & Related papers (2020-03-15T19:49:20Z) - Enhanced Adversarial Strategically-Timed Attacks against Deep
Reinforcement Learning [91.13113161754022]
We introduce timing-based adversarial strategies against a DRL-based navigation system by jamming in physical noise patterns on the selected time frames.
Our experimental results show that the adversarial timing attacks can lead to a significant performance drop.
arXiv Detail & Related papers (2020-02-20T21:39:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.