Related papers: An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem

An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem

URL: http://arxiv.org/abs/2303.17819v1
Date: Fri, 31 Mar 2023 06:30:23 GMT
Title: An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem
Authors: Victor G. Lopez and Matthias A. M\"uller
Abstract summary: An off-policy reinforcement learning algorithm is designed to solve the continuous-time LQR problem using only input-state data measured from the system. We show that, using this persistently excited data, the solution of the matrix equation in our algorithm is guaranteed to exist and to be unique at every iteration.
Score: 2.512827436728378
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In this paper, an off-policy reinforcement learning algorithm is designed to solve the continuous-time LQR problem using only input-state data measured from the system. Different from other algorithms in the literature, we propose the use of a specific persistently exciting input as the exploration signal during the data collection step. We then show that, using this persistently excited data, the solution of the matrix equation in our algorithm is guaranteed to exist and to be unique at every iteration. Convergence of the algorithm to the optimal control input is also proven. Moreover, we formulate the policy evaluation step as the solution of a Sylvester-transpose equation, which increases the efficiency of its solution. Finally, a method to determine a stabilizing policy to initialize the algorithm using only measured data is proposed.

Related papers

Adaptive Resolving Methods for Reinforcement Learning with Function Approximations [4.168629519090361]
We develop a new algorithm to solve theReinforcement learning problems with function approximation.<n>Our algorithm is based on the linear programming (LP) reformulation and it resolves the LP at each improved with new data arrival.<n>In comparison to the $O(1/sqrtN)$ worst-case guarantee established in the previous literature, our instance-dependent guarantee is tighter when the underlying instance is favorable.
arXiv Detail & Related papers (2025-05-17T14:59:15Z)
Rank-One Modified Value Iteration [3.04988705714342]
We provide a novel algorithm for solving planning and learning problems of Markov decision processes.<n>We show that the proposed algorithm consistently outperforms first-order algorithms and their accelerated versions for both planning and learning problems.
arXiv Detail & Related papers (2025-05-03T14:06:50Z)
Subgradient Method for System Identification with Non-Smooth Objectives [16.328866317851187]
This paper investigates a subgradient-based algorithm to solve the system identification problem for linear time-invariant systems with non-smooth objectives. This is the first work to analyze subgradient algorithms for system identification with non-smooth objectives.
arXiv Detail & Related papers (2025-03-20T19:39:47Z)
A Sample Efficient Alternating Minimization-based Algorithm For Robust Phase Retrieval [56.67706781191521]
In this work, we present a robust phase retrieval problem where the task is to recover an unknown signal. Our proposed oracle avoids the need for computationally spectral descent, using a simple gradient step and outliers.
arXiv Detail & Related papers (2024-09-07T06:37:23Z)
Learning Sparse Graphs via Majorization-Minimization for Smooth Node Signals [8.140698535149042]
We propose an algorithm for learning a sparse weighted graph by estimating its adjacency matrix. We show that the proposed algorithm converges faster, in terms of the average number of iterations, than several existing methods in the literature.
arXiv Detail & Related papers (2022-02-06T17:06:13Z)
A Data-Driven Line Search Rule for Support Recovery in High-dimensional Data Analysis [5.180648702293017]
We propose a novel and efficient data-driven line search rule to adaptively determine the appropriate step size. A large number of comparisons with state-of-the-art algorithms in linear and logistic regression problems show the stability, effectiveness and superiority of the proposed algorithms.
arXiv Detail & Related papers (2021-11-21T12:18:18Z)
Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms. For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime. In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem. We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z)
Reinforcement Learning for Adaptive Optimal Stationary Control of Linear Stochastic Systems [15.410124023805249]
This paper studies the adaptive optimal stationary control of continuous-time linear systems with both additive and multiplicative noises. A novel off-policy reinforcement learning algorithm, named optimistic least-squares-based iteration policy, is proposed.
arXiv Detail & Related papers (2021-07-16T09:27:02Z)
Average-Reward Off-Policy Policy Evaluation with Function Approximation [66.67075551933438]
We consider off-policy policy evaluation with function approximation in average-reward MDPs. bootstrapping is necessary and, along with off-policy learning and FA, results in the deadly triad. We propose two novel algorithms, reproducing the celebrated success of Gradient TD algorithms in the average-reward setting.
arXiv Detail & Related papers (2021-01-08T00:43:04Z)
An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions. We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z)
Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs. The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z)
Model-free optimal control of discrete-time systems with additive and multiplicative noises [1.656520517245166]
This paper investigates the optimal control problem for a class of discrete-time systems subject to additive and multiplicative noises. A modelfree reinforcement learning algorithm is proposed to learn the optimal admissible control policy using the data of the system states and inputs. It is proven that the learning algorithm converges to the optimal admissible control policy.
arXiv Detail & Related papers (2020-08-20T02:18:00Z)
Initializing Successive Linear Programming Solver for ACOPF using Machine Learning [0.0]
This paper examines various machine learning (ML) algorithms available in the Scikit-Learn library to initialize an SLP-ACOPF solver. We evaluate the quality of each of these machine learning algorithms for predicting variables needed for a power flow solution. The approach is tested on a congested and non-congested 3 bus systems.
arXiv Detail & Related papers (2020-07-17T20:01:55Z)
Run2Survive: A Decision-theoretic Approach to Algorithm Selection based on Survival Analysis [75.64261155172856]
survival analysis (SA) naturally supports censored data and offers appropriate ways to use such data for learning distributional models of algorithm runtime. We leverage such models as a basis of a sophisticated decision-theoretic approach to algorithm selection, which we dub Run2Survive. In an extensive experimental study with the standard benchmark ASlib, our approach is shown to be highly competitive and in many cases even superior to state-of-the-art AS approaches.
arXiv Detail & Related papers (2020-07-06T15:20:17Z)
Accelerated Message Passing for Entropy-Regularized MAP Inference [89.15658822319928]
Maximum a posteriori (MAP) inference in discrete-valued random fields is a fundamental problem in machine learning. Due to the difficulty of this problem, linear programming (LP) relaxations are commonly used to derive specialized message passing algorithms. We present randomized methods for accelerating these algorithms by leveraging techniques that underlie classical accelerated gradient.
arXiv Detail & Related papers (2020-07-01T18:43:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.