Efficient Wasserstein Natural Gradients for Reinforcement Learning
- URL: http://arxiv.org/abs/2010.05380v4
- Date: Thu, 18 Mar 2021 10:41:34 GMT
- Title: Efficient Wasserstein Natural Gradients for Reinforcement Learning
- Authors: Ted Moskovitz, Michael Arbel, Ferenc Huszar, Arthur Gretton
- Abstract summary: A novel optimization approach is proposed for application to policy gradient methods and evolution strategies for reinforcement learning.
The procedure uses a computationally efficient Wasserstein natural gradient (WNG) descent that takes advantage of the geometry induced by a Wasserstein penalty to speed optimization.
- Score: 31.15380502703101
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A novel optimization approach is proposed for application to policy gradient
methods and evolution strategies for reinforcement learning (RL). The procedure
uses a computationally efficient Wasserstein natural gradient (WNG) descent
that takes advantage of the geometry induced by a Wasserstein penalty to speed
optimization. This method follows the recent theme in RL of including a
divergence penalty in the objective to establish a trust region. Experiments on
challenging tasks demonstrate improvements in both computational cost and
performance over advanced baselines.
Related papers
- Mollification Effects of Policy Gradient Methods [16.617678267301702]
We develop a rigorous framework for understanding how policy gradient methods mollify non-smooth optimization landscapes.
We demonstrate the equivalence between policy gradient methods and solving backward heat equations.
We make the connection between this limitation and the uncertainty principle in harmonic analysis to understand the effects of exploration with policies in RL.
arXiv Detail & Related papers (2024-05-28T05:05:33Z) - Assessment of Reinforcement Learning Algorithms for Nuclear Power Plant
Fuel Optimization [0.0]
This work presents a first-of-a-kind approach to utilize deep RL to solve the loading pattern problem and could be leveraged for any engineering design optimization.
arXiv Detail & Related papers (2023-05-09T23:51:24Z) - Optimal Neural Network Approximation of Wasserstein Gradient Direction
via Convex Optimization [43.6961980403682]
The computation of Wasserstein gradient direction is essential for posterior sampling problems and scientific computing.
We study the variational problem in the family of two-layer networks with squared-ReLU activations, towards which we derive a semi-definite programming (SDP) relaxation.
This SDP can be viewed as an approximation of the Wasserstein gradient in a broader function family including two-layer networks.
arXiv Detail & Related papers (2022-05-26T00:51:12Z) - Bag of Tricks for Natural Policy Gradient Reinforcement Learning [87.54231228860495]
We have implemented and compared strategies that impact performance in natural policy gradient reinforcement learning.
The proposed collection of strategies for performance optimization can improve results by 86% to 181% across the MuJuCo control benchmark.
arXiv Detail & Related papers (2022-01-22T17:44:19Z) - Bregman Gradient Policy Optimization [97.73041344738117]
We design a Bregman gradient policy optimization for reinforcement learning based on Bregman divergences and momentum techniques.
VR-BGPO reaches the best complexity $tilde(epsilon-3)$ for finding an $epsilon$stationary point only requiring one trajectory at each iteration.
arXiv Detail & Related papers (2021-06-23T01:08:54Z) - On the Linear convergence of Natural Policy Gradient Algorithm [5.027714423258537]
Recent interest in Reinforcement Learning has motivated the study of methods inspired by optimization.
Among these is the Natural Policy Gradient, which is a mirror descent variant for MDPs.
We present improved finite time convergence bounds, and show that this algorithm has geometric convergence rate.
arXiv Detail & Related papers (2021-05-04T11:26:12Z) - Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling.
Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Variance-Reduced Off-Policy Memory-Efficient Policy Search [61.23789485979057]
Off-policy policy optimization is a challenging problem in reinforcement learning.
Off-policy algorithms are memory-efficient and capable of learning from off-policy samples.
arXiv Detail & Related papers (2020-09-14T16:22:46Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.