Safe Adaptive Learning-based Control for Constrained Linear Quadratic
Regulators with Regret Guarantees
- URL: http://arxiv.org/abs/2111.00411v1
- Date: Sun, 31 Oct 2021 05:52:42 GMT
- Title: Safe Adaptive Learning-based Control for Constrained Linear Quadratic
Regulators with Regret Guarantees
- Authors: Yingying Li, Subhro Das, Jeff Shamma, Na Li
- Abstract summary: We study the adaptive control of an unknown linear system with a quadratic cost function subject to safety constraints on both the states and actions.
Our algorithm is implemented on a single trajectory and does not require system restarts.
- Score: 11.627320138064684
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the adaptive control of an unknown linear system with a quadratic
cost function subject to safety constraints on both the states and actions. The
challenges of this problem arise from the tension among safety, exploration,
performance, and computation. To address these challenges, we propose a
polynomial-time algorithm that guarantees feasibility and constraint
satisfaction with high probability under proper conditions. Our algorithm is
implemented on a single trajectory and does not require system restarts.
Further, we analyze the regret of our learning algorithm compared to the
optimal safe linear controller with known model information. The proposed
algorithm can achieve a $\tilde O(T^{2/3})$ regret, where $T$ is the number of
stages and $\tilde O(\cdot)$ absorbs some logarithmic terms of $T$.
Related papers
- Regret Bounds for Episodic Risk-Sensitive Linear Quadratic Regulator [5.445357652101423]
Risk-sensitive linear quadratic regulator is one of the most fundamental problems in risk-sensitive optimal control.
We propose a simple least-squares greedy algorithm and show that it achieves $widetildemathcalO(log N)$ regret.
This is the first set of regret bounds for episodic risk-sensitive linear quadratic regulator.
arXiv Detail & Related papers (2024-06-08T06:06:20Z) - Data-Driven H-infinity Control with a Real-Time and Efficient
Reinforcement Learning Algorithm: An Application to Autonomous
Mobility-on-Demand Systems [3.5897534810405403]
This paper presents a model-free, real-time, data-efficient Q-learning-based algorithm to solve the H$_infty$ control of linear discrete-time systems.
An adaptive optimal controller is designed and the parameters of the action and critic networks are learned online without the knowledge of the system dynamics.
arXiv Detail & Related papers (2023-09-16T05:02:41Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Safe Exploration for Constrained Reinforcement Learning with Provable
Guarantees [2.379828460137829]
We propose a model-based safe RL algorithm that we call the Optimistic-Pessimistic Safe Reinforcement Learning (OPSRL) algorithm.
We show that it achieves an $tildemathcalO(S2sqrtA H7K/ (barC - barC_b)$ cumulative regret without violating the safety constraints during learning.
arXiv Detail & Related papers (2021-12-01T23:21:48Z) - Safe Online Convex Optimization with Unknown Linear Safety Constraints [0.0]
We study the problem of safe online convex optimization, where the action at each time step must satisfy a set of linear safety constraints.
The parameters that specify the linear safety constraints are unknown to the algorithm.
We show that, under the assumption of the availability of a safe baseline action, the SO-PGD algorithm achieves a regret $O(T2/3)$.
arXiv Detail & Related papers (2021-11-14T19:49:19Z) - Dynamic Regret Minimization for Control of Non-stationary Linear
Dynamical Systems [18.783925692307054]
We present an algorithm that achieves the optimal dynamic regret of $tildemathcalO(sqrtST)$ where $S$ is the number of switches.
The crux of our algorithm is an adaptive non-stationarity detection strategy, which builds on an approach recently developed for contextual Multi-armed Bandit problems.
arXiv Detail & Related papers (2021-11-06T01:30:51Z) - Finite-time System Identification and Adaptive Control in Autoregressive
Exogenous Systems [79.67879934935661]
We study the problem of system identification and adaptive control of unknown ARX systems.
We provide finite-time learning guarantees for the ARX systems under both open-loop and closed-loop data collection.
arXiv Detail & Related papers (2021-08-26T18:00:00Z) - Safe Reinforcement Learning with Linear Function Approximation [48.75026009895308]
We introduce safety as an unknown linear cost function of states and actions, which must always fall below a certain threshold.
We then present algorithms, termed SLUCB-QVI and RSLUCB-QVI, for episodic Markov decision processes (MDPs) with linear function approximation.
We show that SLUCB-QVI and RSLUCB-QVI, while with emphno safety violation, achieve a $tildemathcalOleft(kappasqrtd3H3Tright)$ regret, nearly matching
arXiv Detail & Related papers (2021-06-11T08:46:57Z) - Chance-Constrained Trajectory Optimization for Safe Exploration and
Learning of Nonlinear Systems [81.7983463275447]
Learning-based control algorithms require data collection with abundant supervision for training.
We present a new approach for optimal motion planning with safe exploration that integrates chance-constrained optimal control with dynamics learning and feedback control.
arXiv Detail & Related papers (2020-05-09T05:57:43Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z) - Naive Exploration is Optimal for Online LQR [49.681825576239355]
We show that the optimal regret scales as $widetildeTheta(sqrtd_mathbfu2 d_mathbfx T)$, where $T$ is the number of time steps, $d_mathbfu$ is the dimension of the input space, and $d_mathbfx$ is the dimension of the system state.
Our lower bounds rule out the possibility of a $mathrmpoly(logT)$-regret algorithm, which had been
arXiv Detail & Related papers (2020-01-27T03:44:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.