Riemannian Proximal Policy Optimization
- URL: http://arxiv.org/abs/2005.09195v1
- Date: Tue, 19 May 2020 03:37:59 GMT
- Title: Riemannian Proximal Policy Optimization
- Authors: Shijun Wang, Baocheng Zhu, Chen Li, Mingzhe Wu, James Zhang, Wei Chu,
Yuan Qi
- Abstract summary: We employ a generalian proximal optimization algorithm with guaranteed convergence to solve Markov decision process (MDP) problems.
To formulate a policy model in the MDP problem, we formulate it as a nondefinite mixture model (GMs)
- Score: 15.532281292327031
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper, We propose a general Riemannian proximal optimization
algorithm with guaranteed convergence to solve Markov decision process (MDP)
problems. To model policy functions in MDP, we employ Gaussian mixture model
(GMM) and formulate it as a nonconvex optimization problem in the Riemannian
space of positive semidefinite matrices. For two given policy functions, we
also provide its lower bound on policy improvement by using bounds derived from
the Wasserstein distance of GMMs. Preliminary experiments show the efficacy of
our proposed Riemannian proximal policy optimization algorithm.
Related papers
- Towards Efficient Exact Optimization of Language Model Alignment [93.39181634597877]
Direct preference optimization (DPO) was proposed to directly optimize the policy from preference data.
We show that DPO derived based on the optimal solution of problem leads to a compromised mean-seeking approximation of the optimal solution in practice.
We propose efficient exact optimization (EXO) of the alignment objective.
arXiv Detail & Related papers (2024-02-01T18:51:54Z) - Moreau Envelope ADMM for Decentralized Weakly Convex Optimization [55.2289666758254]
This paper proposes a proximal variant of the alternating direction method of multipliers (ADMM) for distributed optimization.
The results of our numerical experiments indicate that our method is faster and more robust than widely-used approaches.
arXiv Detail & Related papers (2023-08-31T14:16:30Z) - Last-Iterate Convergent Policy Gradient Primal-Dual Methods for
Constrained MDPs [107.28031292946774]
We study the problem of computing an optimal policy of an infinite-horizon discounted Markov decision process (constrained MDP)
We develop two single-time-scale policy-based primal-dual algorithms with non-asymptotic convergence of their policy iterates to an optimal constrained policy.
To the best of our knowledge, this work appears to be the first non-asymptotic policy last-iterate convergence result for single-time-scale algorithms in constrained MDPs.
arXiv Detail & Related papers (2023-06-20T17:27:31Z) - Policy Gradient Algorithms Implicitly Optimize by Continuation [7.351769270728942]
We argue that exploration in policy-gradient algorithms consists in a continuation of the return of the policy at hand, and that policies should be history-dependent rather than to maximize the return.
arXiv Detail & Related papers (2023-05-11T14:50:20Z) - Global Algorithms for Mean-Variance Optimization in Markov Decision
Processes [8.601670707452083]
Dynamic optimization of mean and variance in Markov decision processes (MDPs) is a long-standing challenge caused by the failure of dynamic programming.
We propose a new approach to find the globally optimal policy for combined metrics of steady-state mean and variance in an infinite-horizon undiscounted MDP.
arXiv Detail & Related papers (2023-02-27T12:17:43Z) - Riemannian Optimization for Variance Estimation in Linear Mixed Models [0.0]
We take a completely novel view on parameter estimation in linear mixed models by exploiting the intrinsic geometry of the parameter space.
Our approach yields a higher quality of the variance parameter estimates compared to existing approaches.
arXiv Detail & Related papers (2022-12-18T13:08:45Z) - First-Order Algorithms for Min-Max Optimization in Geodesic Metric
Spaces [93.35384756718868]
min-max algorithms have been analyzed in the Euclidean setting.
We prove that the extraiteient (RCEG) method corrected lastrate convergence at a linear rate.
arXiv Detail & Related papers (2022-06-04T18:53:44Z) - Non-Convex Optimization with Certificates and Fast Rates Through Kernel
Sums of Squares [68.8204255655161]
We consider potentially non- optimized approximation problems.
In this paper, we propose an algorithm that achieves close to optimal a priori computational guarantees.
arXiv Detail & Related papers (2022-04-11T09:37:04Z) - Near Optimal Policy Optimization via REPS [33.992374484681704]
emphrelative entropy policy search (REPS) has demonstrated successful policy learning on a number of simulated and real-world robotic domains.
There exist no guarantees on REPS's performance when using gradient-based solvers.
We introduce a technique that uses emphgenerative access to the underlying decision process to compute parameter updates that maintain favorable convergence to the optimal regularized policy.
arXiv Detail & Related papers (2021-03-17T16:22:59Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - A Riemannian Primal-dual Algorithm Based on Proximal Operator and its
Application in Metric Learning [3.511851311025242]
We propose a primal-dual algorithm to optimize the primal and dual variables iteratively.
We prove convergence of the proposed algorithm and show its non-asymptotic convergence rate.
Preliminary experimental results on an optimal fund selection problem in fund of funds management showed its efficacy.
arXiv Detail & Related papers (2020-05-19T03:31:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.