Revisiting LQR Control from the Perspective of Receding-Horizon Policy
Gradient
- URL: http://arxiv.org/abs/2302.13144v3
- Date: Wed, 31 Jan 2024 20:58:43 GMT
- Title: Revisiting LQR Control from the Perspective of Receding-Horizon Policy
Gradient
- Authors: Xiangyuan Zhang, Tamer Ba\c{s}ar
- Abstract summary: We revisit the discrete-time linear quadratic regulator (LQR) problem from the perspective of receding-horizon policy gradient (RHPG)
We provide a fine-grained sample analysis for G to learn a control policy that is both stabilizing and $epsilon-close to the optimal LQR solution.
- Score: 2.1756081703276
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We revisit in this paper the discrete-time linear quadratic regulator (LQR)
problem from the perspective of receding-horizon policy gradient (RHPG), a
newly developed model-free learning framework for control applications. We
provide a fine-grained sample complexity analysis for RHPG to learn a control
policy that is both stabilizing and $\epsilon$-close to the optimal LQR
solution, and our algorithm does not require knowing a stabilizing control
policy for initialization. Combined with the recent application of RHPG in
learning the Kalman filter, we demonstrate the general applicability of RHPG in
linear control and estimation with streamlined analyses.
Related papers
- Full error analysis of policy gradient learning algorithms for exploratory linear quadratic mean-field control problem in continuous time with common noise [0.0]
We study policy gradient (PG) learning and first demonstrate convergence in a model-based setting.
We prove the global linear convergence and sample complexity of the PG algorithm with two-point gradient estimates in a model-free setting.
In this setting, the parameterized optimal policies are learned from samples of the states and population distribution.
arXiv Detail & Related papers (2024-08-05T14:11:51Z) - Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions.
We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z) - Global Convergence of Receding-Horizon Policy Search in Learning
Estimator Designs [3.0811185425377743]
We introduce the receding-horizon policy estimator (RHPG) algorithm.
RHPG is the first algorithm with provable global convergence in learning optimal linear policy estimator.
arXiv Detail & Related papers (2023-09-09T16:03:49Z) - Learning the Kalman Filter with Fine-Grained Sample Complexity [4.301206378997673]
We develop the first end-to-end sample complexity of model-free policy gradient (PG) methods in discrete-time infinite-horizon Kalman filtering.
Our results shed light on applying model-free PG methods to control a linear dynamical system where the state measurements could be corrupted by statistical noises and other (possibly adversarial) disturbances.
arXiv Detail & Related papers (2023-01-30T02:41:18Z) - Thompson Sampling Achieves $\tilde O(\sqrt{T})$ Regret in Linear
Quadratic Control [85.22735611954694]
We study the problem of adaptive control of stabilizable linear-quadratic regulators (LQRs) using Thompson Sampling (TS)
We propose an efficient TS algorithm for the adaptive control of LQRs, TSAC, that attains $tilde O(sqrtT)$ regret, even for multidimensional systems.
arXiv Detail & Related papers (2022-06-17T02:47:53Z) - A general sample complexity analysis of vanilla policy gradient [101.16957584135767]
Policy gradient (PG) is one of the most popular reinforcement learning (RL) problems.
"vanilla" theoretical understanding of PG trajectory is one of the most popular methods for solving RL problems.
arXiv Detail & Related papers (2021-07-23T19:38:17Z) - Gaussian Process-based Min-norm Stabilizing Controller for
Control-Affine Systems with Uncertain Input Effects and Dynamics [90.81186513537777]
We propose a novel compound kernel that captures the control-affine nature of the problem.
We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP)
arXiv Detail & Related papers (2020-11-14T01:27:32Z) - Robust Reinforcement Learning: A Case Study in Linear Quadratic
Regulation [23.76925146112261]
This paper studies the robustness of reinforcement learning algorithms to errors in the learning process.
It is shown that policy iteration for LQR is inherently robust to small errors in the learning process.
arXiv Detail & Related papers (2020-08-25T11:11:28Z) - Structured Policy Iteration for Linear Quadratic Regulator [40.52288246664592]
We introduce the textitStructured Policy Iteration (S-PI) for LQR, a method capable of deriving a structured linear policy.
Such a structured policy with (block) sparsity or low-rank can have significant advantages over the standard LQR policy.
In both the known-model and model-free setting, we prove convergence analysis under the proper choice of parameters.
arXiv Detail & Related papers (2020-07-13T06:03:15Z) - Zeroth-order Deterministic Policy Gradient [116.87117204825105]
We introduce Zeroth-order Deterministic Policy Gradient (ZDPG)
ZDPG approximates policy-reward gradients via two-point evaluations of the $Q$function.
New finite sample complexity bounds for ZDPG improve upon existing results by up to two orders of magnitude.
arXiv Detail & Related papers (2020-06-12T16:52:29Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.