Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective
Reinforcement Learning
- URL: http://arxiv.org/abs/2206.05357v1
- Date: Fri, 10 Jun 2022 21:09:44 GMT
- Title: Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective
Reinforcement Learning
- Authors: Ruida Zhou, Tao Liu, Dileep Kalathil, P. R. Kumar, Chao Tian
- Abstract summary: We study policy optimization for Markov decision processes (MDPs) with multiple reward value functions.
We propose an Anchor-changing Regularized Natural Policy Gradient framework, which can incorporate ideas from well-performing first-order methods.
- Score: 17.916366827429034
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study policy optimization for Markov decision processes (MDPs) with
multiple reward value functions, which are to be jointly optimized according to
given criteria such as proportional fairness (smooth concave scalarization),
hard constraints (constrained MDP), and max-min trade-off. We propose an
Anchor-changing Regularized Natural Policy Gradient (ARNPG) framework, which
can systematically incorporate ideas from well-performing first-order methods
into the design of policy optimization algorithms for multi-objective MDP
problems. Theoretically, the designed algorithms based on the ARNPG framework
achieve $\tilde{O}(1/T)$ global convergence with exact gradients. Empirically,
the ARNPG-guided algorithms also demonstrate superior performance compared to
some existing policy gradient-based approaches in both exact gradients and
sample-based scenarios.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.