An Efficient Off-Policy Reinforcement Learning Algorithm for the
Continuous-Time LQR Problem
- URL: http://arxiv.org/abs/2303.17819v1
- Date: Fri, 31 Mar 2023 06:30:23 GMT
- Title: An Efficient Off-Policy Reinforcement Learning Algorithm for the
Continuous-Time LQR Problem
- Authors: Victor G. Lopez and Matthias A. M\"uller
- Abstract summary: An off-policy reinforcement learning algorithm is designed to solve the continuous-time LQR problem using only input-state data measured from the system.
We show that, using this persistently excited data, the solution of the matrix equation in our algorithm is guaranteed to exist and to be unique at every iteration.
- Score: 2.512827436728378
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this paper, an off-policy reinforcement learning algorithm is designed to
solve the continuous-time LQR problem using only input-state data measured from
the system. Different from other algorithms in the literature, we propose the
use of a specific persistently exciting input as the exploration signal during
the data collection step. We then show that, using this persistently excited
data, the solution of the matrix equation in our algorithm is guaranteed to
exist and to be unique at every iteration. Convergence of the algorithm to the
optimal control input is also proven. Moreover, we formulate the policy
evaluation step as the solution of a Sylvester-transpose equation, which
increases the efficiency of its solution. Finally, a method to determine a
stabilizing policy to initialize the algorithm using only measured data is
proposed.
Related papers
- Learning Sparse Graphs via Majorization-Minimization for Smooth Node
Signals [8.140698535149042]
We propose an algorithm for learning a sparse weighted graph by estimating its adjacency matrix.
We show that the proposed algorithm converges faster, in terms of the average number of iterations, than several existing methods in the literature.
arXiv Detail & Related papers (2022-02-06T17:06:13Z) - A Data-Driven Line Search Rule for Support Recovery in High-dimensional
Data Analysis [5.180648702293017]
We propose a novel and efficient data-driven line search rule to adaptively determine the appropriate step size.
A large number of comparisons with state-of-the-art algorithms in linear and logistic regression problems show the stability, effectiveness and superiority of the proposed algorithms.
arXiv Detail & Related papers (2021-11-21T12:18:18Z) - Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms.
For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime.
In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem.
We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z) - Reinforcement Learning for Adaptive Optimal Stationary Control of Linear
Stochastic Systems [15.410124023805249]
This paper studies the adaptive optimal stationary control of continuous-time linear systems with both additive and multiplicative noises.
A novel off-policy reinforcement learning algorithm, named optimistic least-squares-based iteration policy, is proposed.
arXiv Detail & Related papers (2021-07-16T09:27:02Z) - Average-Reward Off-Policy Policy Evaluation with Function Approximation [66.67075551933438]
We consider off-policy policy evaluation with function approximation in average-reward MDPs.
bootstrapping is necessary and, along with off-policy learning and FA, results in the deadly triad.
We propose two novel algorithms, reproducing the celebrated success of Gradient TD algorithms in the average-reward setting.
arXiv Detail & Related papers (2021-01-08T00:43:04Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - Model-free optimal control of discrete-time systems with additive and
multiplicative noises [1.656520517245166]
This paper investigates the optimal control problem for a class of discrete-time systems subject to additive and multiplicative noises.
A modelfree reinforcement learning algorithm is proposed to learn the optimal admissible control policy using the data of the system states and inputs.
It is proven that the learning algorithm converges to the optimal admissible control policy.
arXiv Detail & Related papers (2020-08-20T02:18:00Z) - Initializing Successive Linear Programming Solver for ACOPF using
Machine Learning [0.0]
This paper examines various machine learning (ML) algorithms available in the Scikit-Learn library to initialize an SLP-ACOPF solver.
We evaluate the quality of each of these machine learning algorithms for predicting variables needed for a power flow solution.
The approach is tested on a congested and non-congested 3 bus systems.
arXiv Detail & Related papers (2020-07-17T20:01:55Z) - Run2Survive: A Decision-theoretic Approach to Algorithm Selection based
on Survival Analysis [75.64261155172856]
survival analysis (SA) naturally supports censored data and offers appropriate ways to use such data for learning distributional models of algorithm runtime.
We leverage such models as a basis of a sophisticated decision-theoretic approach to algorithm selection, which we dub Run2Survive.
In an extensive experimental study with the standard benchmark ASlib, our approach is shown to be highly competitive and in many cases even superior to state-of-the-art AS approaches.
arXiv Detail & Related papers (2020-07-06T15:20:17Z) - Accelerated Message Passing for Entropy-Regularized MAP Inference [89.15658822319928]
Maximum a posteriori (MAP) inference in discrete-valued random fields is a fundamental problem in machine learning.
Due to the difficulty of this problem, linear programming (LP) relaxations are commonly used to derive specialized message passing algorithms.
We present randomized methods for accelerating these algorithms by leveraging techniques that underlie classical accelerated gradient.
arXiv Detail & Related papers (2020-07-01T18:43:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.