Regret Analysis of Certainty Equivalence Policies in Continuous-Time
Linear-Quadratic Systems
- URL: http://arxiv.org/abs/2206.04434v1
- Date: Thu, 9 Jun 2022 11:47:36 GMT
- Title: Regret Analysis of Certainty Equivalence Policies in Continuous-Time
Linear-Quadratic Systems
- Authors: Mohamad Kazem Shirani Faradonbeh
- Abstract summary: This work studies theoretical performance guarantees of a ubiquitous reinforcement learning policy for controlling the canonical model of linear-quadratic system.
We establish square-root of time regret bounds, indicating that randomized certainty equivalent policy learns optimal control actions fast from a single state trajectory.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work studies theoretical performance guarantees of a ubiquitous
reinforcement learning policy for controlling the canonical model of stochastic
linear-quadratic system. We show that randomized certainty equivalent policy
addresses the exploration-exploitation dilemma for minimizing quadratic costs
in linear dynamical systems that evolve according to stochastic differential
equations. More precisely, we establish square-root of time regret bounds,
indicating that randomized certainty equivalent policy learns optimal control
actions fast from a single state trajectory. Further, linear scaling of the
regret with the number of parameters is shown. The presented analysis
introduces novel and useful technical approaches, and sheds light on
fundamental challenges of continuous-time reinforcement learning.
Related papers
- Learning Controlled Stochastic Differential Equations [61.82896036131116]
This work proposes a novel method for estimating both drift and diffusion coefficients of continuous, multidimensional, nonlinear controlled differential equations with non-uniform diffusion.
We provide strong theoretical guarantees, including finite-sample bounds for (L2), (Linfty), and risk metrics, with learning rates adaptive to coefficients' regularity.
Our method is available as an open-source Python library.
arXiv Detail & Related papers (2024-11-04T11:09:58Z) - Sublinear Regret for a Class of Continuous-Time Linear--Quadratic Reinforcement Learning Problems [10.404992912881601]
We study reinforcement learning for a class of continuous-time linear-quadratic (LQ) control problems for diffusions.
We apply a model-free approach that relies neither on knowledge of model parameters nor on their estimations, and devise an actor-critic algorithm to learn the optimal policy parameter directly.
arXiv Detail & Related papers (2024-07-24T12:26:21Z) - Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Non-asymptotic System Identification for Linear Systems with Nonlinear
Policies [17.420749574975368]
This paper considers a single-trajectory system identification problem for linear systems under general nonlinear and/or time-varying policies.
We provide a non-asymptotic error bound for least square estimation when the data trajectory is generated by any nonlinear and/or time-varying policies.
arXiv Detail & Related papers (2023-06-17T15:05:59Z) - A New Approach to Learning Linear Dynamical Systems [19.47235707806519]
We provide the first time algorithm for learning a linear dynamical system from a length trajectory up to error in the system parameters.
Our algorithm is built on a method of moments estimator to directly estimate parameters from which the dynamics can be extracted.
arXiv Detail & Related papers (2023-01-23T16:07:57Z) - Online Control of Unknown Time-Varying Dynamical Systems [48.75672260851758]
We study online control of time-varying linear systems with unknown dynamics in the nonstochastic control model.
We study regret bounds with respect to common classes of policies: Disturbance Action (SLS), Disturbance Response (Youla), and linear feedback policies.
arXiv Detail & Related papers (2022-02-16T06:57:14Z) - Time varying regression with hidden linear dynamics [74.9914602730208]
We revisit a model for time-varying linear regression that assumes the unknown parameters evolve according to a linear dynamical system.
Counterintuitively, we show that when the underlying dynamics are stable the parameters of this model can be estimated from data by combining just two ordinary least squares estimates.
arXiv Detail & Related papers (2021-12-29T23:37:06Z) - Structure-Preserving Learning Using Gaussian Processes and Variational
Integrators [62.31425348954686]
We propose the combination of a variational integrator for the nominal dynamics of a mechanical system and learning residual dynamics with Gaussian process regression.
We extend our approach to systems with known kinematic constraints and provide formal bounds on the prediction uncertainty.
arXiv Detail & Related papers (2021-12-10T11:09:29Z) - Reinforcement Learning Policies in Continuous-Time Linear Systems [0.0]
We present online policies that learn optimal actions fast by carefully randomizing the parameter estimates.
We prove sharp stability results for inexact system dynamics and tightly specify the infinitesimal regret caused by sub-optimal actions.
Our analysis sheds light on fundamental challenges in continuous-time reinforcement learning and suggests a useful cornerstone for similar problems.
arXiv Detail & Related papers (2021-09-16T00:08:50Z) - Online Policy Gradient for Model Free Learning of Linear Quadratic
Regulators with $\sqrt{T}$ Regret [0.0]
We present the first model-free algorithm that achieves similar regret guarantees.
Our method relies on an efficient policy gradient scheme, and a novel and tighter analysis of the cost of exploration in policy space.
arXiv Detail & Related papers (2021-02-25T00:25:41Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.