Regret Analysis of Certainty Equivalence Policies in Continuous-Time
Linear-Quadratic Systems
- URL: http://arxiv.org/abs/2206.04434v1
- Date: Thu, 9 Jun 2022 11:47:36 GMT
- Title: Regret Analysis of Certainty Equivalence Policies in Continuous-Time
Linear-Quadratic Systems
- Authors: Mohamad Kazem Shirani Faradonbeh
- Abstract summary: This work studies theoretical performance guarantees of a ubiquitous reinforcement learning policy for controlling the canonical model of linear-quadratic system.
We establish square-root of time regret bounds, indicating that randomized certainty equivalent policy learns optimal control actions fast from a single state trajectory.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work studies theoretical performance guarantees of a ubiquitous
reinforcement learning policy for controlling the canonical model of stochastic
linear-quadratic system. We show that randomized certainty equivalent policy
addresses the exploration-exploitation dilemma for minimizing quadratic costs
in linear dynamical systems that evolve according to stochastic differential
equations. More precisely, we establish square-root of time regret bounds,
indicating that randomized certainty equivalent policy learns optimal control
actions fast from a single state trajectory. Further, linear scaling of the
regret with the number of parameters is shown. The presented analysis
introduces novel and useful technical approaches, and sheds light on
fundamental challenges of continuous-time reinforcement learning.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.