An Empirical Analysis of Measure-Valued Derivatives for Policy Gradients
- URL: http://arxiv.org/abs/2107.09359v1
- Date: Tue, 20 Jul 2021 09:26:10 GMT
- Title: An Empirical Analysis of Measure-Valued Derivatives for Policy Gradients
- Authors: Jo\~ao Carvalho, Davide Tateo, Fabio Muratore, Jan Peters
- Abstract summary: We study a different type of gradient estimator: the Measure-Valued Derivative.
This estimator is unbiased, has low variance, and can be used with differentiable and non-differentiable function approximators.
We empirically evaluate this estimator in the actor-critic policy gradient setting and show that it can reach comparable performance with methods based on the likelihood-ratio or reparametrization tricks.
- Score: 24.976352541745403
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning methods for robotics are increasingly successful due
to the constant development of better policy gradient techniques. A precise
(low variance) and accurate (low bias) gradient estimator is crucial to face
increasingly complex tasks. Traditional policy gradient algorithms use the
likelihood-ratio trick, which is known to produce unbiased but high variance
estimates. More modern approaches exploit the reparametrization trick, which
gives lower variance gradient estimates but requires differentiable value
function approximators. In this work, we study a different type of stochastic
gradient estimator: the Measure-Valued Derivative. This estimator is unbiased,
has low variance, and can be used with differentiable and non-differentiable
function approximators. We empirically evaluate this estimator in the
actor-critic policy gradient setting and show that it can reach comparable
performance with methods based on the likelihood-ratio or reparametrization
tricks, both in low and high-dimensional action spaces.
Related papers
- Compatible Gradient Approximations for Actor-Critic Algorithms [0.0]
We introduce an actor-critic algorithm that bypasses the need for such precision by employing a zerothorder approximation of the action-value gradient.
Empirical results demonstrate that our algorithm not only matches but frequently exceeds the performance of current state-of-the-art methods.
arXiv Detail & Related papers (2024-09-02T22:00:50Z) - Policy Gradient with Active Importance Sampling [55.112959067035916]
Policy gradient (PG) methods significantly benefit from IS, enabling the effective reuse of previously collected samples.
However, IS is employed in RL as a passive tool for re-weighting historical samples.
We look for the best behavioral policy from which to collect samples to reduce the policy gradient variance.
arXiv Detail & Related papers (2024-05-09T09:08:09Z) - Model-Based Reparameterization Policy Gradient Methods: Theory and
Practical Algorithms [88.74308282658133]
Reization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics.
Recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes.
We propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls.
arXiv Detail & Related papers (2023-10-30T18:43:21Z) - An Analysis of Measure-Valued Derivatives for Policy Gradients [37.241788708646574]
We study a different type of gradient estimator - the Measure-Valued Derivative.
This estimator is unbiased, has low variance, and can be used with differentiable and non-differentiable function approximators.
We show that it can reach comparable performance with methods based on the likelihood-ratio or reparametrization tricks.
arXiv Detail & Related papers (2022-03-08T08:26:31Z) - Gradient Estimation with Discrete Stein Operators [44.64146470394269]
We introduce a variance reduction technique based on Stein operators for discrete distributions.
Our technique achieves substantially lower variance than state-of-the-art estimators with the same number of function evaluations.
arXiv Detail & Related papers (2022-02-19T02:22:23Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Unifying Gradient Estimators for Meta-Reinforcement Learning via
Off-Policy Evaluation [53.83642844626703]
We provide a unifying framework for estimating higher-order derivatives of value functions, based on off-policy evaluation.
Our framework interprets a number of prior approaches as special cases and elucidates the bias and variance trade-off of Hessian estimates.
arXiv Detail & Related papers (2021-06-24T15:58:01Z) - Storchastic: A Framework for General Stochastic Automatic
Differentiation [9.34612743192798]
We introduce Storchastic, a new framework for automatic differentiation of graphs.
Storchastic allows the modeler to choose from a wide variety of gradient estimation methods at each sampling step.
Storchastic is provably unbiased for estimation of any-order gradients, and generalizes variance reduction techniques to higher-order gradient estimates.
arXiv Detail & Related papers (2021-04-01T12:19:54Z) - Batch Reinforcement Learning with a Nonparametric Off-Policy Policy
Gradient [34.16700176918835]
Off-policy Reinforcement Learning holds the promise of better data efficiency.
Current off-policy policy gradient methods either suffer from high bias or high variance, delivering often unreliable estimates.
We propose a nonparametric Bellman equation, which can be solved in closed form.
arXiv Detail & Related papers (2020-10-27T13:40:06Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.