Deep Bayesian Quadrature Policy Optimization
- URL: http://arxiv.org/abs/2006.15637v3
- Date: Wed, 16 Dec 2020 15:14:05 GMT
- Title: Deep Bayesian Quadrature Policy Optimization
- Authors: Akella Ravi Tej, Kamyar Azizzadenesheli, Mohammad Ghavamzadeh, Anima
Anandkumar, Yisong Yue
- Abstract summary: Deep Bayesian quadrature policy gradient (DBQPG) is a high-dimensional generalization of Bayesian quadrature for policy gradient estimation.
We show that DBQPG can substitute Monte-Carlo estimation in policy gradient methods, and demonstrate its effectiveness on a set of continuous control benchmarks.
- Score: 100.81242753620597
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of obtaining accurate policy gradient estimates using a
finite number of samples. Monte-Carlo methods have been the default choice for
policy gradient estimation, despite suffering from high variance in the
gradient estimates. On the other hand, more sample efficient alternatives like
Bayesian quadrature methods have received little attention due to their high
computational complexity. In this work, we propose deep Bayesian quadrature
policy gradient (DBQPG), a computationally efficient high-dimensional
generalization of Bayesian quadrature, for policy gradient estimation. We show
that DBQPG can substitute Monte-Carlo estimation in policy gradient methods,
and demonstrate its effectiveness on a set of continuous control benchmarks. In
comparison to Monte-Carlo estimation, DBQPG provides (i) more accurate gradient
estimates with a significantly lower variance, (ii) a consistent improvement in
the sample complexity and average return for several deep policy gradient
algorithms, and, (iii) the uncertainty in gradient estimation that can be
incorporated to further improve the performance.
Related papers
- Gradient Informed Proximal Policy Optimization [35.22712034665224]
We introduce a novel policy learning method that integrates analytical gradients from differentiable environments with the Proximal Policy Optimization (PPO) algorithm.
By adaptively modifying the alpha value, we can effectively manage the influence of analytical policy gradients during learning.
Our proposed approach outperforms baseline algorithms in various scenarios, such as function optimization, physics simulations, and traffic control environments.
arXiv Detail & Related papers (2023-12-14T07:50:21Z) - Model-Based Reparameterization Policy Gradient Methods: Theory and
Practical Algorithms [88.74308282658133]
Reization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics.
Recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes.
We propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls.
arXiv Detail & Related papers (2023-10-30T18:43:21Z) - Policy Gradient for Rectangular Robust Markov Decision Processes [62.397882389472564]
We introduce robust policy gradient (RPG), a policy-based method that efficiently solves rectangular robust Markov decision processes (MDPs)
Our resulting RPG can be estimated from data with the same time complexity as its non-robust equivalent.
arXiv Detail & Related papers (2023-01-31T12:40:50Z) - Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time
Guarantees [56.848265937921354]
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy.
Many algorithms for IRL have an inherently nested structure.
We develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy.
arXiv Detail & Related papers (2022-10-04T17:13:45Z) - Policy Learning and Evaluation with Randomized Quasi-Monte Carlo [28.835015520341766]
We propose to replace Monte Carlo samples with low-discrepancy point sets.
We combine policy gradient methods with Randomized Quasi-Monte Carlo, yielding variance-reduced formulations of policy gradient and actor-critic algorithms.
Our empirical analyses validate the intuition that replacing Monte Carlo with Quasi-Monte Carlo yields significantly more accurate gradient estimates.
arXiv Detail & Related papers (2022-02-16T00:42:12Z) - PAGE-PG: A Simple and Loopless Variance-Reduced Policy Gradient Method
with Probabilistic Gradient Estimation [6.063525456640462]
We propose a novel loopless variance-reduced policy gradient method based on a probabilistic switch between two types of updates.
We show that our method enjoys a $mathcalOleft( epsilon-3 right)$ average sample complexity to reach an $epsilon$-stationary solution.
A numerical evaluation confirms the competitive performance of our method on classical control tasks.
arXiv Detail & Related papers (2022-02-01T10:10:49Z) - Optimal Estimation of Off-Policy Policy Gradient via Double Fitted
Iteration [39.250754806600135]
Policy (PG) estimation becomes a challenge when we are not allowed to sample with the target policy.
Conventional methods for off-policy PG estimation often suffer from significant bias or exponentially large variance.
In this paper, we propose the double Fitted PG estimation (FPG) algorithm.
arXiv Detail & Related papers (2022-01-31T20:23:52Z) - Zeroth-order Deterministic Policy Gradient [116.87117204825105]
We introduce Zeroth-order Deterministic Policy Gradient (ZDPG)
ZDPG approximates policy-reward gradients via two-point evaluations of the $Q$function.
New finite sample complexity bounds for ZDPG improve upon existing results by up to two orders of magnitude.
arXiv Detail & Related papers (2020-06-12T16:52:29Z) - Stochastic Recursive Momentum for Policy Gradient Methods [28.277961340108313]
We propose a novel algorithm named STOchastic Recursive Momentum for Policy Gradient (Storm-PG)
Storm-PG enjoys a provably sharp $O (1/epsilon3)$ sample bound for STORM-PG, matching the best-known convergence rate for policy gradient algorithm.
Numerical experiments depicts the superiority of our algorithm over comparative policy gradient algorithms.
arXiv Detail & Related papers (2020-03-09T17:59:03Z) - Statistically Efficient Off-Policy Policy Gradients [80.42316902296832]
We consider the statistically efficient estimation of policy gradients from off-policy data.
We propose a meta-algorithm that achieves the lower bound without any parametric assumptions.
We establish guarantees on the rate at which we approach a stationary point when we take steps in the direction of our new estimated policy gradient.
arXiv Detail & Related papers (2020-02-10T18:41:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.