Towards Provable Log Density Policy Gradient
- URL: http://arxiv.org/abs/2403.01605v1
- Date: Sun, 3 Mar 2024 20:09:09 GMT
- Title: Towards Provable Log Density Policy Gradient
- Authors: Pulkit Katdare, Anant Joshi and Katherine Driggs-Campbell
- Abstract summary: Policy gradient methods are a vital ingredient behind the success of modern reinforcement learning.
In this work, we argue that this residual term is significant and correcting for it could potentially improve sample-complexity of reinforcement learning methods.
We propose log density gradient to estimate the policy gradient, which corrects for this residual error term.
- Score: 6.0891236991406945
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Policy gradient methods are a vital ingredient behind the success of modern
reinforcement learning. Modern policy gradient methods, although successful,
introduce a residual error in gradient estimation. In this work, we argue that
this residual term is significant and correcting for it could potentially
improve sample-complexity of reinforcement learning methods. To that end, we
propose log density gradient to estimate the policy gradient, which corrects
for this residual error term. Log density gradient method computes policy
gradient by utilising the state-action discounted distributional formulation.
We first present the equations needed to exactly find the log density gradient
for a tabular Markov Decision Processes (MDPs). For more complex environments,
we propose a temporal difference (TD) method that approximates log density
gradient by utilizing backward on-policy samples. Since backward sampling from
a Markov chain is highly restrictive we also propose a min-max optimization
that can approximate log density gradient using just on-policy samples. We also
prove uniqueness, and convergence under linear function approximation, for this
min-max optimization. Finally, we show that the sample complexity of our
min-max optimization to be of the order of $m^{-1/2}$, where $m$ is the number
of on-policy samples. We also demonstrate a proof-of-concept for our log
density gradient method on gridworld environment, and observe that our method
is able to improve upon the classical policy gradient method by a clear margin,
thus indicating a promising novel direction to develop reinforcement learning
algorithms that require fewer samples.
Related papers
- Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective.
We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices.
Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z) - Robust Stochastic Optimization via Gradient Quantile Clipping [6.2844649973308835]
We introduce a quant clipping strategy for Gradient Descent (SGD)
We use gradient new outliers as norm clipping chains.
We propose an implementation of the algorithm using Huberiles.
arXiv Detail & Related papers (2023-09-29T15:24:48Z) - Sobolev Space Regularised Pre Density Models [51.558848491038916]
We propose a new approach to non-parametric density estimation that is based on regularizing a Sobolev norm of the density.
This method is statistically consistent, and makes the inductive validation model clear and consistent.
arXiv Detail & Related papers (2023-07-25T18:47:53Z) - Convergence of Batch Stochastic Gradient Descent Methods with
Approximate Gradients and/or Noisy Measurements: Theory and Computational
Results [0.9900482274337404]
We study convex optimization using a very general formulation called BSGD (Block Gradient Descent)
We establish conditions for BSGD to converge to the global minimum, based on approximation theory.
We show that when approximate gradients are used, BSGD converges while momentum-based methods can diverge.
arXiv Detail & Related papers (2022-09-12T16:23:15Z) - A Temporal-Difference Approach to Policy Gradient Estimation [27.749993205038148]
We propose a new approach of reconstructing the policy gradient from the start state without requiring a particular sampling strategy.
By using temporal-difference updates of the gradient critic from an off-policy data stream, we develop the first estimator that sidesteps the distribution shift issue in a model-free way.
arXiv Detail & Related papers (2022-02-04T21:23:33Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Tighter Bounds on the Log Marginal Likelihood of Gaussian Process
Regression Using Conjugate Gradients [19.772149500352945]
We show that approximate maximum likelihood learning of model parameters by maximising our lower bound retains many of the sparse variational approach benefits.
In experiments, we show improved predictive performance with our model for a comparable amount of training time compared to other conjugate gradient based approaches.
arXiv Detail & Related papers (2021-02-16T17:54:59Z) - Stochastic Gradient Variance Reduction by Solving a Filtering Problem [0.951828574518325]
Deep neural networks (DNN) are typically using optimized gradient descent (SGD)
The estimation of the gradient using samples tends to be noisy and unreliable, resulting in large gradient variance and bad convergence.
We propose textbfFilter Gradient Decent(FGD), an efficient optimization algorithm that makes the consistent estimation of gradient.
arXiv Detail & Related papers (2020-12-22T23:48:42Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Deep Bayesian Quadrature Policy Optimization [100.81242753620597]
Deep Bayesian quadrature policy gradient (DBQPG) is a high-dimensional generalization of Bayesian quadrature for policy gradient estimation.
We show that DBQPG can substitute Monte-Carlo estimation in policy gradient methods, and demonstrate its effectiveness on a set of continuous control benchmarks.
arXiv Detail & Related papers (2020-06-28T15:44:47Z) - Carath\'eodory Sampling for Stochastic Gradient Descent [79.55586575988292]
We present an approach that is inspired by classical results of Tchakaloff and Carath'eodory about measure reduction.
We adaptively select the descent steps where the measure reduction is carried out.
We combine this with Block Coordinate Descent so that measure reduction can be done very cheaply.
arXiv Detail & Related papers (2020-06-02T17:52:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.