Data-Driven H-infinity Control with a Real-Time and Efficient
Reinforcement Learning Algorithm: An Application to Autonomous
Mobility-on-Demand Systems
- URL: http://arxiv.org/abs/2309.08880v1
- Date: Sat, 16 Sep 2023 05:02:41 GMT
- Title: Data-Driven H-infinity Control with a Real-Time and Efficient
Reinforcement Learning Algorithm: An Application to Autonomous
Mobility-on-Demand Systems
- Authors: Ali Aalipour and Alireza Khani
- Abstract summary: This paper presents a model-free, real-time, data-efficient Q-learning-based algorithm to solve the H$_infty$ control of linear discrete-time systems.
An adaptive optimal controller is designed and the parameters of the action and critic networks are learned online without the knowledge of the system dynamics.
- Score: 3.5897534810405403
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Reinforcement learning (RL) is a class of artificial intelligence algorithms
being used to design adaptive optimal controllers through online learning. This
paper presents a model-free, real-time, data-efficient Q-learning-based
algorithm to solve the H$_{\infty}$ control of linear discrete-time systems.
The computational complexity is shown to reduce from
$\mathcal{O}(\underline{q}^3)$ in the literature to
$\mathcal{O}(\underline{q}^2)$ in the proposed algorithm, where $\underline{q}$
is quadratic in the sum of the size of state variables, control inputs, and
disturbance. An adaptive optimal controller is designed and the parameters of
the action and critic networks are learned online without the knowledge of the
system dynamics, making the proposed algorithm completely model-free. Also, a
sufficient probing noise is only needed in the first iteration and does not
affect the proposed algorithm. With no need for an initial stabilizing policy,
the algorithm converges to the closed-form solution obtained by solving the
Riccati equation. A simulation study is performed by applying the proposed
algorithm to real-time control of an autonomous mobility-on-demand (AMoD)
system for a real-world case study to evaluate the effectiveness of the
proposed algorithm.
Related papers
- Sublinear Regret for a Class of Continuous-Time Linear--Quadratic Reinforcement Learning Problems [10.404992912881601]
We study reinforcement learning for a class of continuous-time linear-quadratic (LQ) control problems for diffusions.
We apply a model-free approach that relies neither on knowledge of model parameters nor on their estimations, and devise an actor-critic algorithm to learn the optimal policy parameter directly.
arXiv Detail & Related papers (2024-07-24T12:26:21Z) - Efficient Methods for Non-stationary Online Learning [67.3300478545554]
We present efficient methods for optimizing dynamic regret and adaptive regret, which reduce the number of projections per round from $mathcalO(log T)$ to $1$.
Our technique hinges on the reduction mechanism developed in parameter-free online learning and requires non-trivial twists on non-stationary online methods.
arXiv Detail & Related papers (2023-09-16T07:30:12Z) - Safe Adaptive Learning-based Control for Constrained Linear Quadratic
Regulators with Regret Guarantees [11.627320138064684]
We study the adaptive control of an unknown linear system with a quadratic cost function subject to safety constraints on both the states and actions.
Our algorithm is implemented on a single trajectory and does not require system restarts.
arXiv Detail & Related papers (2021-10-31T05:52:42Z) - Finite-time System Identification and Adaptive Control in Autoregressive
Exogenous Systems [79.67879934935661]
We study the problem of system identification and adaptive control of unknown ARX systems.
We provide finite-time learning guarantees for the ARX systems under both open-loop and closed-loop data collection.
arXiv Detail & Related papers (2021-08-26T18:00:00Z) - Online Model Selection for Reinforcement Learning with Function
Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret.
We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z) - Average Cost Optimal Control of Stochastic Systems Using Reinforcement
Learning [0.19036571490366497]
We propose an online learning scheme to estimate the kernel matrix of Q-function.
The obtained control gain and kernel matrix are proved to converge to the optimal ones.
arXiv Detail & Related papers (2020-10-13T08:51:06Z) - Model-free optimal control of discrete-time systems with additive and
multiplicative noises [1.656520517245166]
This paper investigates the optimal control problem for a class of discrete-time systems subject to additive and multiplicative noises.
A modelfree reinforcement learning algorithm is proposed to learn the optimal admissible control policy using the data of the system states and inputs.
It is proven that the learning algorithm converges to the optimal admissible control policy.
arXiv Detail & Related papers (2020-08-20T02:18:00Z) - Reinforcement Learning with Fast Stabilization in Linear Dynamical
Systems [91.43582419264763]
We study model-based reinforcement learning (RL) in unknown stabilizable linear dynamical systems.
We propose an algorithm that certifies fast stabilization of the underlying system by effectively exploring the environment.
We show that the proposed algorithm attains $tildemathcalO(sqrtT)$ regret after $T$ time steps of agent-environment interaction.
arXiv Detail & Related papers (2020-07-23T23:06:40Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.