Related papers: Dealing with Sparse Rewards in Continuous Control Robotics via Heavy-Tailed Policies

Dealing with Sparse Rewards in Continuous Control Robotics via Heavy-Tailed Policies

URL: http://arxiv.org/abs/2206.05652v1
Date: Sun, 12 Jun 2022 04:09:39 GMT
Title: Dealing with Sparse Rewards in Continuous Control Robotics via Heavy-Tailed Policies
Authors: Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Pratap Tokekar, and Dinesh Manocha
Abstract summary: We present a novel Heavy-Tailed Policy Gradient (HT-PSG) algorithm to deal with the challenges of sparse rewards in continuous control problems. We show consistent performance improvement across all tasks in terms of high average cumulative reward.
Score: 64.2210390071609
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we present a novel Heavy-Tailed Stochastic Policy Gradient (HT-PSG) algorithm to deal with the challenges of sparse rewards in continuous control problems. Sparse reward is common in continuous control robotics tasks such as manipulation and navigation, and makes the learning problem hard due to non-trivial estimation of value functions over the state space. This demands either reward shaping or expert demonstrations for the sparse reward environment. However, obtaining high-quality demonstrations is quite expensive and sometimes even impossible. We propose a heavy-tailed policy parametrization along with a modified momentum-based policy gradient tracking scheme (HT-SPG) to induce a stable exploratory behavior to the algorithm. The proposed algorithm does not require access to expert demonstrations. We test the performance of HT-SPG on various benchmark tasks of continuous control with sparse rewards such as 1D Mario, Pathological Mountain Car, Sparse Pendulum in OpenAI Gym, and Sparse MuJoCo environments (Hopper-v2). We show consistent performance improvement across all tasks in terms of high average cumulative reward. HT-SPG also demonstrates improved convergence speed with minimum samples, thereby emphasizing the sample efficiency of our proposed algorithm.

Related papers

Hierarchical Deep Deterministic Policy Gradient for Autonomous Maze Navigation of Mobile Robots [5.834520772858807]
This paper proposes an efficient Hierarchical DDPG (HDDPG) algorithm, which includes high-level and low-level policies.<n>It significantly overcomes the limitations of standard DDPG and its variants, improving the success rate by at least 56.59% and boosting the average reward by a minimum of 519.03.
arXiv Detail & Related papers (2025-08-07T03:06:22Z)
Trajectory-Oriented Policy Optimization with Sparse Rewards [2.9602904918952695]
We introduce an approach leveraging offline demonstration trajectories for swifter and more efficient online RL in environments with sparse rewards. Our pivotal insight involves treating offline demonstration trajectories as guidance, rather than mere imitation. We then illustrate that this optimization problem can be streamlined into a policy-gradient algorithm, integrating rewards shaped by insights from offline demonstrations.
arXiv Detail & Related papers (2024-01-04T12:21:01Z)
Policy Optimization with Smooth Guidance Learned from State-Only Demonstrations [2.709826237514737]
The sparsity of reward feedback remains a challenging problem in online deep reinforcement learning. We propose a simple and efficient algorithm called Policy Optimization with Smooth Guidance (POSG) We show POSG's significant advantages in control performance and convergence speed in four sparse-reward environments.
arXiv Detail & Related papers (2023-12-30T07:41:45Z)
Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees [56.848265937921354]
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy. Many algorithms for IRL have an inherently nested structure. We develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy.
arXiv Detail & Related papers (2022-10-04T17:13:45Z)
On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game [140.19656665344917]
We study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function. We tackle this problem under the context of function approximation, leveraging powerful function approximators. We establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.
arXiv Detail & Related papers (2021-10-19T07:26:33Z)
Continuous-Time Fitted Value Iteration for Robust Policies [93.25997466553929]
Solving the Hamilton-Jacobi-Bellman equation is important in many domains including control, robotics and economics. We propose continuous fitted value iteration (cFVI) and robust fitted value iteration (rFVI) These algorithms leverage the non-linear control-affine dynamics and separable state and action reward of many continuous control problems.
arXiv Detail & Related papers (2021-10-05T11:33:37Z)
Generative Actor-Critic: An Off-policy Algorithm Using the Push-forward Model [24.030426634281643]
In continuous control tasks, widely used policies with Gaussian distributions results in ineffective exploration of environments. We propose a density-free off-policy algorithm, Generative Actor-Critic, using the push-forward model to increase the expressiveness of policies. We show that push-forward policies possess desirable features, such as multi-modality, which can improve the efficiency of exploration and performance of algorithms obviously.
arXiv Detail & Related papers (2021-05-08T16:29:20Z)
Deep Reinforcement Learning for Haptic Shared Control in Unknown Tasks [1.0635248457021496]
Haptic shared control (HSC) is an alternative to direct teleoperation in teleoperated systems. The application of virtual guiding forces decreases the user's control effort and improves execution time in various tasks. The challenge lies in developing controllers to provide the optimal guiding forces for the tasks that are being performed. This work addresses this challenge by designing a controller based on the deep deterministic policy gradient (DDPG) algorithm to provide the assistance, and a convolutional neural network (CNN) to perform the task detection.
arXiv Detail & Related papers (2021-01-15T17:27:38Z)
Demonstration-efficient Inverse Reinforcement Learning in Procedurally Generated Environments [137.86426963572214]
Inverse Reinforcement Learning can extrapolate reward functions from expert demonstrations. We show that our approach, DE-AIRL, is demonstration-efficient and still able to extrapolate reward functions which generalize to the fully procedural domain.
arXiv Detail & Related papers (2020-12-04T11:18:02Z)
Zeroth-order Deterministic Policy Gradient [116.87117204825105]
We introduce Zeroth-order Deterministic Policy Gradient (ZDPG) ZDPG approximates policy-reward gradients via two-point evaluations of the $Q$function. New finite sample complexity bounds for ZDPG improve upon existing results by up to two orders of magnitude.
arXiv Detail & Related papers (2020-06-12T16:52:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.