Hierarchical Deep Deterministic Policy Gradient for Autonomous Maze Navigation of Mobile Robots
- URL: http://arxiv.org/abs/2508.04994v1
- Date: Thu, 07 Aug 2025 03:06:22 GMT
- Title: Hierarchical Deep Deterministic Policy Gradient for Autonomous Maze Navigation of Mobile Robots
- Authors: Wenjie Hu, Ye Zhou, Hann Woei Ho,
- Abstract summary: This paper proposes an efficient Hierarchical DDPG (HDDPG) algorithm, which includes high-level and low-level policies.<n>It significantly overcomes the limitations of standard DDPG and its variants, improving the success rate by at least 56.59% and boosting the average reward by a minimum of 519.03.
- Score: 5.834520772858807
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Maze navigation is a fundamental challenge in robotics, requiring agents to traverse complex environments efficiently. While the Deep Deterministic Policy Gradient (DDPG) algorithm excels in control tasks, its performance in maze navigation suffers from sparse rewards, inefficient exploration, and long-horizon planning difficulties, often leading to low success rates and average rewards, sometimes even failing to achieve effective navigation. To address these limitations, this paper proposes an efficient Hierarchical DDPG (HDDPG) algorithm, which includes high-level and low-level policies. The high-level policy employs an advanced DDPG framework to generate intermediate subgoals from a long-term perspective and on a higher temporal scale. The low-level policy, also powered by the improved DDPG algorithm, generates primitive actions by observing current states and following the subgoal assigned by the high-level policy. The proposed method enhances stability with off-policy correction, refining subgoal assignments by relabeling historical experiences. Additionally, adaptive parameter space noise is utilized to improve exploration, and a reshaped intrinsic-extrinsic reward function is employed to boost learning efficiency. Further optimizations, including gradient clipping and Xavier initialization, are employed to improve robustness. The proposed algorithm is rigorously evaluated through numerical simulation experiments executed using the Robot Operating System (ROS) and Gazebo. Regarding the three distinct final targets in autonomous maze navigation tasks, HDDPG significantly overcomes the limitations of standard DDPG and its variants, improving the success rate by at least 56.59% and boosting the average reward by a minimum of 519.03 compared to baseline algorithms.
Related papers
- Robust Deterministic Policy Gradient for Disturbance Attenuation and Its Application to Quadrotor Control [5.084000938840218]
This paper proposes a reinforcement learning algorithm called Robust Deterministic Policy Gradient (RDPG)<n>RDPG formulates the $H_infty$ control problem as a two-player zero-sum dynamic game.<n>We then employ deterministic policy gradient (DPG) and its deep reinforcement learning counterpart to train a robust control policy with effective disturbance attenuation.
arXiv Detail & Related papers (2025-02-28T13:58:22Z) - Hierarchical Preference Optimization: Learning to achieve goals via feasible subgoals prediction [71.81851971324187]
This work introduces Hierarchical Preference Optimization (HPO), a novel approach to hierarchical reinforcement learning (HRL)
HPO addresses non-stationarity and infeasible subgoal generation issues when solving complex robotic control tasks.
Experiments on challenging robotic navigation and manipulation tasks demonstrate impressive performance of HPO, where it shows an improvement of up to 35% over the baselines.
arXiv Detail & Related papers (2024-11-01T04:58:40Z) - Autonomous Navigation of Unmanned Vehicle Through Deep Reinforcement Learning [1.3725832537448668]
The paper details the model of a Ackermann robot and the structure and application of the DDPG algorithm.
The results demonstrate that the DDPG algorithm outperforms traditional Deep Q-Network (DQN) and Double Deep Q-Network (DDQN) algorithms in path planning tasks.
arXiv Detail & Related papers (2024-07-18T05:18:59Z) - Confidence-Controlled Exploration: Efficient Sparse-Reward Policy Learning for Robot Navigation [72.24964965882783]
Reinforcement learning (RL) is a promising approach for robotic navigation, allowing robots to learn through trial and error.<n>Real-world robotic tasks often suffer from sparse rewards, leading to inefficient exploration and suboptimal policies.<n>We introduce Confidence-Controlled Exploration (CCE), a novel method that improves sample efficiency in RL-based robotic navigation without modifying the reward function.
arXiv Detail & Related papers (2023-06-09T18:45:15Z) - Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time
Guarantees [56.848265937921354]
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy.
Many algorithms for IRL have an inherently nested structure.
We develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy.
arXiv Detail & Related papers (2022-10-04T17:13:45Z) - Autonomous Platoon Control with Integrated Deep Reinforcement Learning
and Dynamic Programming [12.661547303266252]
It is more challenging to learn a stable and efficient car-following policy when there are multiple following vehicles in a platoon.
We adopt an integrated DRL and Dynamic Programming approach to learn autonomous platoon control policies.
We propose an algorithm, namely Finite-Horizon-DDPG with Sweeping through reduced state space.
arXiv Detail & Related papers (2022-06-15T13:45:47Z) - Dealing with Sparse Rewards in Continuous Control Robotics via
Heavy-Tailed Policies [64.2210390071609]
We present a novel Heavy-Tailed Policy Gradient (HT-PSG) algorithm to deal with the challenges of sparse rewards in continuous control problems.
We show consistent performance improvement across all tasks in terms of high average cumulative reward.
arXiv Detail & Related papers (2022-06-12T04:09:39Z) - Multi-Agent Path Planning based on MPC and DDPG [14.793341914236166]
We propose a new algorithm combining Model Predictive Control (MPC) with Deep Deterministic Policy Gradient (DDPG)
The DDPG with continuous action space is designed to provide learning and autonomous decision-making capability for robots.
We employ Unity 3D to perform simulation experiments in highly uncertain environment such as aircraft carrier decks and squares.
arXiv Detail & Related papers (2021-02-26T02:57:13Z) - Zeroth-order Deterministic Policy Gradient [116.87117204825105]
We introduce Zeroth-order Deterministic Policy Gradient (ZDPG)
ZDPG approximates policy-reward gradients via two-point evaluations of the $Q$function.
New finite sample complexity bounds for ZDPG improve upon existing results by up to two orders of magnitude.
arXiv Detail & Related papers (2020-06-12T16:52:29Z) - Optimization-driven Deep Reinforcement Learning for Robust Beamforming
in IRS-assisted Wireless Communications [54.610318402371185]
Intelligent reflecting surface (IRS) is a promising technology to assist downlink information transmissions from a multi-antenna access point (AP) to a receiver.
We minimize the AP's transmit power by a joint optimization of the AP's active beamforming and the IRS's passive beamforming.
We propose a deep reinforcement learning (DRL) approach that can adapt the beamforming strategies from past experiences.
arXiv Detail & Related papers (2020-05-25T01:42:55Z) - Obstacle Avoidance and Navigation Utilizing Reinforcement Learning with
Reward Shaping [7.132368785057316]
We propose revised Deep Deterministic Policy Gradient (DDPG) and Proximal Policy Optimization algorithms with an improved reward shaping technique.
We compare the performances between the original DDPG and PPO with the revised version of both on simulations with a real mobile robot and demonstrate that the proposed algorithms achieve better results.
arXiv Detail & Related papers (2020-03-28T18:29:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.