Structured Policy Iteration for Linear Quadratic Regulator
- URL: http://arxiv.org/abs/2007.06202v1
- Date: Mon, 13 Jul 2020 06:03:15 GMT
- Title: Structured Policy Iteration for Linear Quadratic Regulator
- Authors: Youngsuk Park, Ryan A. Rossi, Zheng Wen, Gang Wu, Handong Zhao
- Abstract summary: We introduce the textitStructured Policy Iteration (S-PI) for LQR, a method capable of deriving a structured linear policy.
Such a structured policy with (block) sparsity or low-rank can have significant advantages over the standard LQR policy.
In both the known-model and model-free setting, we prove convergence analysis under the proper choice of parameters.
- Score: 40.52288246664592
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Linear quadratic regulator (LQR) is one of the most popular frameworks to
tackle continuous Markov decision process tasks. With its fundamental theory
and tractable optimal policy, LQR has been revisited and analyzed in recent
years, in terms of reinforcement learning scenarios such as the model-free or
model-based setting. In this paper, we introduce the \textit{Structured Policy
Iteration} (S-PI) for LQR, a method capable of deriving a structured linear
policy. Such a structured policy with (block) sparsity or low-rank can have
significant advantages over the standard LQR policy: more interpretable,
memory-efficient, and well-suited for the distributed setting. In order to
derive such a policy, we first cast a regularized LQR problem when the model is
known. Then, our Structured Policy Iteration (S-PI) algorithm, which takes a
policy evaluation step and a policy improvement step in an iterative manner,
can solve this regularized LQR efficiently. We further extend the S-PI
algorithm to the model-free setting where a smoothing procedure is adopted to
estimate the gradient. In both the known-model and model-free setting, we prove
convergence analysis under the proper choice of parameters. Finally, the
experiments demonstrate the advantages of S-PI in terms of balancing the LQR
performance and level of structure by varying the weight parameter.
Related papers
- REBEL: Reinforcement Learning via Regressing Relative Rewards [59.68420022466047]
We propose REBEL, a minimalist RL algorithm for the era of generative models.
In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL.
We find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO.
arXiv Detail & Related papers (2024-04-25T17:20:45Z) - PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback [106.63518036538163]
We present a novel unified bilevel optimization-based framework, textsfPARL, formulated to address the recently highlighted critical issue of policy alignment in reinforcement learning.
Our framework addressed these concerns by explicitly parameterizing the distribution of the upper alignment objective (reward design) by the lower optimal variable.
Our empirical results substantiate that the proposed textsfPARL can address the alignment concerns in RL by showing significant improvements.
arXiv Detail & Related papers (2023-08-03T18:03:44Z) - A Novel Framework for Policy Mirror Descent with General
Parameterization and Linear Convergence [15.807079236265714]
We introduce a novel framework for policy optimization based on mirror descent.
We obtain the first result that guarantees linear convergence for a policy-gradient-based method involving general parameterization.
arXiv Detail & Related papers (2023-01-30T18:21:48Z) - Solving Multistage Stochastic Linear Programming via Regularized Linear
Decision Rules: An Application to Hydrothermal Dispatch Planning [77.34726150561087]
We propose a novel regularization scheme for linear decision rules (LDR) based on the AdaSO (adaptive least absolute shrinkage and selection operator)
Experiments show that the overfit threat is non-negligible when using the classical non-regularized LDR to solve MSLP.
For the LHDP problem, our analysis highlights the following benefits of the proposed framework in comparison to the non-regularized benchmark.
arXiv Detail & Related papers (2021-10-07T02:36:14Z) - Policy Mirror Descent for Regularized Reinforcement Learning: A
Generalized Framework with Linear Convergence [60.20076757208645]
This paper proposes a general policy mirror descent (GPMD) algorithm for solving regularized RL.
We demonstrate that our algorithm converges linearly over an entire range learning rates, in a dimension-free fashion, to the global solution.
arXiv Detail & Related papers (2021-05-24T02:21:34Z) - Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a
Finite Horizon [3.867363075280544]
We explore reinforcement learning methods for finding the optimal policy in the linear quadratic regulator (LQR) problem.
We produce a global linear convergence guarantee for the setting of finite time horizon and state dynamics under weak assumptions.
We show results for the case where we assume a model for the underlying dynamics and where we apply the method to the data directly.
arXiv Detail & Related papers (2020-11-20T09:51:49Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - Robust Reinforcement Learning: A Case Study in Linear Quadratic
Regulation [23.76925146112261]
This paper studies the robustness of reinforcement learning algorithms to errors in the learning process.
It is shown that policy iteration for LQR is inherently robust to small errors in the learning process.
arXiv Detail & Related papers (2020-08-25T11:11:28Z) - Robust Reinforcement Learning using Least Squares Policy Iteration with
Provable Performance Guarantees [3.8073142980733]
This paper addresses the problem of model-free reinforcement learning for Robust Markov Decision Process (RMDP) with large state spaces.
We first propose the Robust Least Squares Policy Evaluation algorithm, which is a multi-step online model-free learning algorithm for policy evaluation.
We then propose Robust Least Squares Policy Iteration (RLSPI) algorithm for learning the optimal robust policy.
arXiv Detail & Related papers (2020-06-20T16:26:50Z) - Convergence Guarantees of Policy Optimization Methods for Markovian Jump
Linear Systems [3.3343656101775365]
We show that the Gauss-Newton method converges to the optimal state feedback controller for MJLS at a linear rate if at a controller which stabilizes the closed-loop dynamics in the mean sense.
We present an example to support our theory.
arXiv Detail & Related papers (2020-02-10T21:13:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.