Related papers: Do Differentiable Simulators Give Better Policy Gradients?

Do Differentiable Simulators Give Better Policy Gradients?

URL: http://arxiv.org/abs/2202.00817v1
Date: Wed, 2 Feb 2022 00:12:28 GMT
Title: Do Differentiable Simulators Give Better Policy Gradients?
Authors: H.J. Terry Suh, Max Simchowitz, Kaiqing Zhang, Russ Tedrake
Abstract summary: We show that characteristics of certain physical systems, such as stiffness or discontinuities, may compromise the efficacy of the first-order estimator. We additionally propose an $alpha$-order gradient estimator, with $alpha in [01]$, which correctly utilizes exact gradients to combine the efficiency of first-order estimates with the robustness of zero-order methods.
Score: 62.54538644503705
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Differentiable simulators promise faster computation time for reinforcement learning by replacing zeroth-order gradient estimates of a stochastic objective with an estimate based on first-order gradients. However, it is yet unclear what factors decide the performance of the two estimators on complex landscapes that involve long-horizon planning and control on physical systems, despite the crucial relevance of this question for the utility of differentiable simulators. We show that characteristics of certain physical systems, such as stiffness or discontinuities, may compromise the efficacy of the first-order estimator, and analyze this phenomenon through the lens of bias and variance. We additionally propose an $\alpha$-order gradient estimator, with $\alpha \in [0,1]$, which correctly utilizes exact gradients to combine the efficiency of first-order estimates with the robustness of zero-order methods. We demonstrate the pitfalls of traditional estimators and the advantages of the $\alpha$-order estimator on some numerical examples.

Related papers

Proximal Iteration for Nonlinear Adaptive Lasso [1.866597543169743]
We study the approach of treating the penalty coefficients as additional decision variables to be learned in a textitMaximum a Posteriori manner. We develop a proximal gradient approach to joint optimization of these together with the parameters of any differentiable cost function.
arXiv Detail & Related papers (2024-12-07T19:19:55Z)
Sample Complexity of the Linear Quadratic Regulator: A Reinforcement Learning Lens [11.98212766542468]
We provide the first known algorithm that achieves $varepsilon$-optimality within $widetildemathcalO (1/varepsilon)$ function evaluations. Our results substantially improve upon the existing literature outside the realm of two-point gradient estimates.
arXiv Detail & Related papers (2024-04-16T18:54:57Z)
Adaptive Perturbation-Based Gradient Estimation for Discrete Latent Variable Models [28.011868604717726]
We present Adaptive IMLE, the first adaptive gradient estimator for complex discrete distributions. We show that our estimator can produce faithful estimates while requiring orders of magnitude fewer samples than other gradient estimators.
arXiv Detail & Related papers (2022-09-11T13:32:39Z)
Momentum-Based Policy Gradient with Second-Order Information [40.51117836892182]
We propose a variance-reduced policy-gradient method, called SHARP, which incorporates second-order information into gradient descent. Unlike most previous work, our proposed algorithm does not require importance sampling which can compromise the advantage of variance reduction process. Our extensive experimental evaluations show the effectiveness of the proposed algorithm on various control tasks and its advantage over the state of the art in practice.
arXiv Detail & Related papers (2022-05-17T11:56:50Z)
On Training Implicit Models [75.20173180996501]
We propose a novel gradient estimate for implicit models, named phantom gradient, that forgoes the costly computation of the exact gradient. Experiments on large-scale tasks demonstrate that these lightweight phantom gradients significantly accelerate the backward passes in training implicit models by roughly 1.7 times.
arXiv Detail & Related papers (2021-11-09T14:40:24Z)
High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails. We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z)
Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information. We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z)
Nearest Neighbour Based Estimates of Gradients: Sharp Nonasymptotic Bounds and Applications [0.6445605125467573]
gradient estimation is of crucial importance in statistics and learning theory. We consider here the classic regression setup, where a real valued square integrable r.v. $Y$ is to be predicted. We prove nonasymptotic bounds improving upon those obtained for alternative estimation methods.
arXiv Detail & Related papers (2020-06-26T15:19:43Z)
Adaptive Gradient Methods Can Be Provably Faster than SGD after Finite Epochs [25.158203665218164]
We show that adaptive gradient methods can be faster than random shuffling SGD after finite time. To the best of our knowledge, it is the first to demonstrate that adaptive gradient methods can be faster than SGD after finite time.
arXiv Detail & Related papers (2020-06-12T09:39:47Z)
Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated. We propose a new method for this estimation problem combining sampling and analytic approximation steps. We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z)
Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks. In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems. Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.