Model-Based Reparameterization Policy Gradient Methods: Theory and
Practical Algorithms
- URL: http://arxiv.org/abs/2310.19927v1
- Date: Mon, 30 Oct 2023 18:43:21 GMT
- Title: Model-Based Reparameterization Policy Gradient Methods: Theory and
Practical Algorithms
- Authors: Shenao Zhang, Boyi Liu, Zhaoran Wang, Tuo Zhao
- Abstract summary: Reization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics.
Recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes.
We propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls.
- Score: 88.74308282658133
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: ReParameterization (RP) Policy Gradient Methods (PGMs) have been widely
adopted for continuous control tasks in robotics and computer graphics.
However, recent studies have revealed that, when applied to long-term
reinforcement learning problems, model-based RP PGMs may experience chaotic and
non-smooth optimization landscapes with exploding gradient variance, which
leads to slow convergence. This is in contrast to the conventional belief that
reparameterization methods have low gradient estimation variance in problems
such as training deep generative models. To comprehend this phenomenon, we
conduct a theoretical examination of model-based RP PGMs and search for
solutions to the optimization difficulties. Specifically, we analyze the
convergence of the model-based RP PGMs and pinpoint the smoothness of function
approximators as a major factor that affects the quality of gradient
estimation. Based on our analysis, we propose a spectral normalization method
to mitigate the exploding variance issue caused by long model unrolls. Our
experimental results demonstrate that proper normalization significantly
reduces the gradient variance of model-based RP PGMs. As a result, the
performance of the proposed method is comparable or superior to other gradient
estimators, such as the Likelihood Ratio (LR) gradient estimator. Our code is
available at https://github.com/agentification/RP_PGM.
Related papers
- Polynomial Chaos Expanded Gaussian Process [2.287415292857564]
In complex and unknown processes, global models are initially generated over the entire experimental space.
This study addresses the need for models that effectively represent both global and local experimental spaces.
arXiv Detail & Related papers (2024-05-02T07:11:05Z) - Differentially Private Optimization with Sparse Gradients [60.853074897282625]
We study differentially private (DP) optimization problems under sparsity of individual gradients.
Building on this, we obtain pure- and approximate-DP algorithms with almost optimal rates for convex optimization with sparse gradients.
arXiv Detail & Related papers (2024-04-16T20:01:10Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Adaptive LASSO estimation for functional hidden dynamic geostatistical
model [69.10717733870575]
We propose a novel model selection algorithm based on a penalized maximum likelihood estimator (PMLE) for functional hiddenstatistical models (f-HD)
The algorithm is based on iterative optimisation and uses an adaptive least absolute shrinkage and selector operator (GMSOLAS) penalty function, wherein the weights are obtained by the unpenalised f-HD maximum-likelihood estimators.
arXiv Detail & Related papers (2022-08-10T19:17:45Z) - Adaptive Latent Factor Analysis via Generalized Momentum-Incorporated
Particle Swarm Optimization [6.2303427193075755]
A gradient descent (SGD) algorithm is an effective learning strategy to build a latent factor analysis (LFA) model on a high-dimensional and incomplete (HDI) matrix.
A particle swarm optimization (PSO) algorithm is commonly adopted to make an SGD-based LFA model's hyper- parameters, i.e., learning rate and regularization coefficient, self-adaptation.
This paper incorporates more historical information into each particle's evolutionary process for avoiding premature convergence.
arXiv Detail & Related papers (2022-08-04T03:15:07Z) - Hierarchical Gaussian Process Models for Regression Discontinuity/Kink
under Sharp and Fuzzy Designs [0.0]
We propose nonparametric Bayesian estimators for causal inference exploiting Regression Discontinuity/Kink (RD/RK)
These estimators are extended to hierarchical GP models with an intermediate Bayesian neural network layer.
Monte Carlo simulations show that our estimators perform similarly and often better than competing estimators in terms of precision, coverage and interval length.
arXiv Detail & Related papers (2021-10-03T04:23:56Z) - Mixed Policy Gradient: off-policy reinforcement learning driven jointly
by data and model [32.61834127169759]
Reinforcement learning (RL) shows great potential in sequential decision-making.
Mainstream RL algorithms are data-driven, which usually yield better performance but much slower convergence compared with model-driven methods.
This paper proposes mixed policy gradient (MPG) algorithm, which fuses the empirical data and the transition model in policy gradient (PG) to accelerate convergence without performance.
arXiv Detail & Related papers (2021-02-23T06:05:17Z) - Unbiased Gradient Estimation for Distributionally Robust Learning [2.1777837784979277]
We consider a new approach based on distributionally robust learning (DRL) that applies gradient descent to the inner problem.
Our algorithm efficiently estimates gradient gradient through multi-level Monte Carlo randomization.
arXiv Detail & Related papers (2020-12-22T21:35:03Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.