Bag of Tricks for Natural Policy Gradient Reinforcement Learning
- URL: http://arxiv.org/abs/2201.09104v1
- Date: Sat, 22 Jan 2022 17:44:19 GMT
- Title: Bag of Tricks for Natural Policy Gradient Reinforcement Learning
- Authors: Brennan Gebotys, Alexander Wong, David A. Clausi
- Abstract summary: We have implemented and compared strategies that impact performance in natural policy gradient reinforcement learning.
The proposed collection of strategies for performance optimization can improve results by 86% to 181% across the MuJuCo control benchmark.
- Score: 87.54231228860495
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Natural policy gradient methods are popular reinforcement learning methods
that improve the stability of policy gradient methods by preconditioning the
gradient with the inverse of the Fisher-information matrix. However, leveraging
natural policy gradient methods in an optimal manner can be very challenging as
many implementation details must be set to achieve optimal performance. To the
best of the authors' knowledge, there has not been a study that has
investigated strategies for setting these details for natural policy gradient
methods to achieve high performance in a comprehensive and systematic manner.
To address this, we have implemented and compared strategies that impact
performance in natural policy gradient reinforcement learning across five
different second-order approximations. These include varying batch sizes and
optimizing the critic network using the natural gradient. Furthermore, insights
about the fundamental trade-offs when optimizing for performance (stability,
sample efficiency, and computation time) were generated. Experimental results
indicate that the proposed collection of strategies for performance
optimization can improve results by 86% to 181% across the MuJuCo control
benchmark, with TENGraD exhibiting the best approximation performance amongst
the tested approximations. Code in this study is available at
https://github.com/gebob19/natural-policy-gradient-reinforcement-learning.
Related papers
- vMFER: Von Mises-Fisher Experience Resampling Based on Uncertainty of Gradient Directions for Policy Improvement [57.926269845305804]
This study focuses on investigating the impact of gradient disagreements caused by ensemble critics on policy improvement.
We introduce the concept of uncertainty of gradient directions as a means to measure the disagreement among gradients utilized in the policy improvement process.
We find that transitions with lower uncertainty of gradient directions are more reliable in the policy improvement process.
arXiv Detail & Related papers (2024-05-14T14:18:25Z) - Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Actor-Critic Reinforcement Learning with Phased Actor [10.577516871906816]
We propose a novel phased actor in actor-critic (PAAC) method to improve policy gradient estimation.
PAAC accounts for both $Q$ value and TD error in its actor update.
Results show that PAAC leads to significant performance improvement measured by total cost, learning variance, robustness, learning speed and success rate.
arXiv Detail & Related papers (2024-04-18T01:27:31Z) - Gradient Informed Proximal Policy Optimization [35.22712034665224]
We introduce a novel policy learning method that integrates analytical gradients from differentiable environments with the Proximal Policy Optimization (PPO) algorithm.
By adaptively modifying the alpha value, we can effectively manage the influence of analytical policy gradients during learning.
Our proposed approach outperforms baseline algorithms in various scenarios, such as function optimization, physics simulations, and traffic control environments.
arXiv Detail & Related papers (2023-12-14T07:50:21Z) - Optimization Landscape of Policy Gradient Methods for Discrete-time
Static Output Feedback [22.21598324895312]
This paper analyzes the optimization landscape inherent to policy gradient methods when applied to static output feedback control.
We derive novel findings regarding convergence (and nearly dimension-free rate) to stationary points for three policy gradient methods.
We provide proof that the vanilla policy gradient method exhibits linear convergence towards local minima when near such minima.
arXiv Detail & Related papers (2023-10-29T14:25:57Z) - Semi-On-Policy Training for Sample Efficient Multi-Agent Policy
Gradients [51.749831824106046]
We introduce semi-on-policy (SOP) training as an effective and computationally efficient way to address the sample inefficiency of on-policy policy gradient methods.
We show that our methods perform as well or better than state-of-the-art value-based methods on a variety of SMAC tasks.
arXiv Detail & Related papers (2021-04-27T19:37:01Z) - On the Convergence and Sample Efficiency of Variance-Reduced Policy
Gradient Method [38.34416337932712]
Policy gives rise to a rich class of reinforcement learning (RL) methods, for example the REINFORCE.
Yet the best known sample complexity result for such methods to find an $epsilon$-optimal policy is $mathcalO(epsilon-3)$, which is suboptimal.
We study the fundamental convergence properties and sample efficiency of first-order policy optimization method.
arXiv Detail & Related papers (2021-02-17T07:06:19Z) - Efficient Wasserstein Natural Gradients for Reinforcement Learning [31.15380502703101]
A novel optimization approach is proposed for application to policy gradient methods and evolution strategies for reinforcement learning.
The procedure uses a computationally efficient Wasserstein natural gradient (WNG) descent that takes advantage of the geometry induced by a Wasserstein penalty to speed optimization.
arXiv Detail & Related papers (2020-10-12T00:50:17Z) - Variance-Reduced Off-Policy Memory-Efficient Policy Search [61.23789485979057]
Off-policy policy optimization is a challenging problem in reinforcement learning.
Off-policy algorithms are memory-efficient and capable of learning from off-policy samples.
arXiv Detail & Related papers (2020-09-14T16:22:46Z) - Stable Policy Optimization via Off-Policy Divergence Regularization [50.98542111236381]
Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) are among the most successful policy gradient approaches in deep reinforcement learning (RL)
We propose a new algorithm which stabilizes the policy improvement through a proximity term that constrains the discounted state-action visitation distribution induced by consecutive policies to be close to one another.
Our proposed method can have a beneficial effect on stability and improve final performance in benchmark high-dimensional control tasks.
arXiv Detail & Related papers (2020-03-09T13:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.