Related papers: Reinforcement Learning for Robot Navigation with Adaptive Forward Simulation Time (AFST) in a Semi-Markov Model

Reinforcement Learning for Robot Navigation with Adaptive Forward Simulation Time (AFST) in a Semi-Markov Model

URL: http://arxiv.org/abs/2108.06161v4
Date: Tue, 4 Jul 2023 12:43:55 GMT
Title: Reinforcement Learning for Robot Navigation with Adaptive Forward Simulation Time (AFST) in a Semi-Markov Model
Authors: Yu'an Chen, Ruosong Ye, Ziyang Tao, Hongjian Liu, Guangda Chen, Jie Peng, Jun Ma, Yu Zhang, Jianmin Ji and Yanyong Zhang
Abstract summary: We propose the first DRL-based navigation method modeled by a semi-Markov decision process (SMDP) with continuous action space, named Adaptive Forward Time Simulation (AFST) to overcome this problem.
Score: 20.91419349793292
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep reinforcement learning (DRL) algorithms have proven effective in robot navigation, especially in unknown environments, by directly mapping perception inputs into robot control commands. However, most existing methods ignore the local minimum problem in navigation and thereby cannot handle complex unknown environments. In this paper, we propose the first DRL-based navigation method modeled by a semi-Markov decision process (SMDP) with continuous action space, named Adaptive Forward Simulation Time (AFST), to overcome this problem. Specifically, we reduce the dimensions of the action space and improve the distributed proximal policy optimization (DPPO) algorithm for the specified SMDP problem by modifying its GAE to better estimate the policy gradient in SMDPs. Experiments in various unknown environments demonstrate the effectiveness of AFST.

Related papers

Deep-Sea A*+: An Advanced Path Planning Method Integrating Enhanced A* and Dynamic Window Approach for Autonomous Underwater Vehicles [1.3807821497779342]
Extreme conditions in the deep-sea environment pose significant challenges for underwater operations. We propose an advanced path planning methodology that integrates an improved A* algorithm with the Dynamic Window Approach (DWA) Our proposed method surpasses the traditional A* algorithm in terms of path smoothness, obstacle avoidance, and real-time performance.
arXiv Detail & Related papers (2024-10-22T07:29:05Z)
Two-Stage ML-Guided Decision Rules for Sequential Decision Making under Uncertainty [55.06411438416805]
Sequential Decision Making under Uncertainty (SDMU) is ubiquitous in many domains such as energy, finance, and supply chains. Some SDMU are naturally modeled as Multistage Problems (MSPs) but the resulting optimizations are notoriously challenging from a computational standpoint. This paper introduces a novel approach Two-Stage General Decision Rules (TS-GDR) to generalize the policy space beyond linear functions. The effectiveness of TS-GDR is demonstrated through an instantiation using Deep Recurrent Neural Networks named Two-Stage Deep Decision Rules (TS-LDR)
arXiv Detail & Related papers (2024-05-23T18:19:47Z)
Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning [6.037202026682975]
We consider the scenario where the escape flight vehicle (EFV) generates guidance commands based on DRL and the pursuit flight vehicle (PFV) generates guidance commands based on the proportional navigation method. For the EFV, the objective of the guidance design entails progressively maximizing the residual velocity, subject to the constraint imposed by the given evasion distance. In the first step, we use the proximal policy optimization (PPO) algorithm to generate the guidance commands of the EFV. In the second step, we propose to invoke the evolution strategy (ES) based algorithm, which uses the result of PPO as the
arXiv Detail & Related papers (2024-05-04T06:18:15Z)
Variational Autoencoders for exteroceptive perception in reinforcement learning-based collision avoidance [0.0]
Deep Reinforcement Learning (DRL) has emerged as a promising control framework. Current DRL algorithms require disproportionally large computational resources to find near-optimal policies. This paper presents a comprehensive exploration of our proposed approach in maritime control systems.
arXiv Detail & Related papers (2024-03-31T09:25:28Z)
GP-guided MPPI for Efficient Navigation in Complex Unknown Cluttered Environments [2.982218441172364]
This study presents the GP-MPPI, an online learning-based control strategy that integrates Model Predictive Path Intergal (MPPI) with a local perception model. We validate the efficiency and robustness of our proposed control strategy through both simulated and real-world experiments of 2D autonomous navigation tasks.
arXiv Detail & Related papers (2023-07-08T17:33:20Z)
DDPEN: Trajectory Optimisation With Sub Goal Generation Model [70.36888514074022]
In this paper, we produce a novel Differential Dynamic Programming with Escape Network (DDPEN) We propose to utilize a deep model that takes as an input map of the environment in the form of a costmap together with the desired position. The model produces possible future directions that will lead to the goal, avoiding local minima which is possible to run in real time conditions.
arXiv Detail & Related papers (2023-01-18T11:02:06Z)
Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm. We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z)
Zeroth-Order Actor-Critic: An Evolutionary Framework for Sequential Decision Problems [17.713459311502636]
We propose a novel evolutionary framework Zeroth-Order Actor-Critic (ZOAC) to solve sequential decision problems (SDPs) ZOAC uses step-wise exploration in parameter space and theoretically derive the zeroth-order policy gradient. It significantly outperforms EAs that treat the problem as static optimization and matches the performance of gradient-based RL methods even without first-order information.
arXiv Detail & Related papers (2022-01-29T07:09:03Z)
Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization [63.75188254377202]
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to discrepancy between source and target environments. We propose a novel model-free actor-critic algorithm to learn robust policies without modeling the disturbance in advance. Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
arXiv Detail & Related papers (2021-12-20T13:13:05Z)
SABER: Data-Driven Motion Planner for Autonomously Navigating Heterogeneous Robots [112.2491765424719]
We present an end-to-end online motion planning framework that uses a data-driven approach to navigate a heterogeneous robot team towards a global goal. We use model predictive control (SMPC) to calculate control inputs that satisfy robot dynamics, and consider uncertainty during obstacle avoidance with chance constraints. recurrent neural networks are used to provide a quick estimate of future state uncertainty considered in the SMPC finite-time horizon solution. A Deep Q-learning agent is employed to serve as a high-level path planner, providing the SMPC with target positions that move the robots towards a desired global goal.
arXiv Detail & Related papers (2021-08-03T02:56:21Z)
Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling. Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z)
Modular Deep Reinforcement Learning for Continuous Motion Planning with Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP) The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP. The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.