Dynamic Regret Minimization for Control of Non-stationary Linear
Dynamical Systems
- URL: http://arxiv.org/abs/2111.03772v1
- Date: Sat, 6 Nov 2021 01:30:51 GMT
- Title: Dynamic Regret Minimization for Control of Non-stationary Linear
Dynamical Systems
- Authors: Yuwei Luo, Varun Gupta, Mladen Kolar
- Abstract summary: We present an algorithm that achieves the optimal dynamic regret of $tildemathcalO(sqrtST)$ where $S$ is the number of switches.
The crux of our algorithm is an adaptive non-stationarity detection strategy, which builds on an approach recently developed for contextual Multi-armed Bandit problems.
- Score: 18.783925692307054
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We consider the problem of controlling a Linear Quadratic Regulator (LQR)
system over a finite horizon $T$ with fixed and known cost matrices $Q,R$, but
unknown and non-stationary dynamics $\{A_t, B_t\}$. The sequence of dynamics
matrices can be arbitrary, but with a total variation, $V_T$, assumed to be
$o(T)$ and unknown to the controller. Under the assumption that a sequence of
stabilizing, but potentially sub-optimal controllers is available for all $t$,
we present an algorithm that achieves the optimal dynamic regret of
$\tilde{\mathcal{O}}\left(V_T^{2/5}T^{3/5}\right)$. With piece-wise constant
dynamics, our algorithm achieves the optimal regret of
$\tilde{\mathcal{O}}(\sqrt{ST})$ where $S$ is the number of switches. The crux
of our algorithm is an adaptive non-stationarity detection strategy, which
builds on an approach recently developed for contextual Multi-armed Bandit
problems. We also argue that non-adaptive forgetting (e.g., restarting or using
sliding window learning with a static window size) may not be regret optimal
for the LQR problem, even when the window size is optimally tuned with the
knowledge of $V_T$. The main technical challenge in the analysis of our
algorithm is to prove that the ordinary least squares (OLS) estimator has a
small bias when the parameter to be estimated is non-stationary. Our analysis
also highlights that the key motif driving the regret is that the LQR problem
is in spirit a bandit problem with linear feedback and locally quadratic cost.
This motif is more universal than the LQR problem itself, and therefore we
believe our results should find wider application.
Related papers
- Revisiting Weighted Strategy for Non-stationary Parametric Bandits [82.1942459195896]
This paper revisits the weighted strategy for non-stationary parametric bandits.
We propose a refined analysis framework, which produces a simpler weight-based algorithm.
Our new framework can be used to improve regret bounds of other parametric bandits.
arXiv Detail & Related papers (2023-03-05T15:11:14Z) - Optimal Dynamic Regret in LQR Control [23.91519151164528]
We consider the problem of nonstochastic control with a sequence of quadratic losses, i.e., LQR control.
We provide an online algorithm that achieves an optimal dynamic (policy) regret of $tildeO(textmaxn1/3 mathcalTV(M_1:n)2/3, 1)$.
arXiv Detail & Related papers (2022-06-18T18:00:21Z) - Safe Adaptive Learning-based Control for Constrained Linear Quadratic
Regulators with Regret Guarantees [11.627320138064684]
We study the adaptive control of an unknown linear system with a quadratic cost function subject to safety constraints on both the states and actions.
Our algorithm is implemented on a single trajectory and does not require system restarts.
arXiv Detail & Related papers (2021-10-31T05:52:42Z) - Finite-time System Identification and Adaptive Control in Autoregressive
Exogenous Systems [79.67879934935661]
We study the problem of system identification and adaptive control of unknown ARX systems.
We provide finite-time learning guarantees for the ARX systems under both open-loop and closed-loop data collection.
arXiv Detail & Related papers (2021-08-26T18:00:00Z) - Geometric Exploration for Online Control [38.87811800375421]
We study the control of an emphunknown linear dynamical system under general convex costs.
The objective is minimizing regret vs. the class of disturbance-feedback-controllers.
arXiv Detail & Related papers (2020-10-25T18:11:28Z) - Black-Box Control for Linear Dynamical Systems [40.352938608995174]
We consider the problem of controlling an unknown linear time-invariant dynamical system from a single chain of black-box interactions.
Under the assumption that the system is controllable, we give the first efficient algorithm that is capable of attaining sublinear regret.
arXiv Detail & Related papers (2020-07-13T19:43:19Z) - Efficient Optimistic Exploration in Linear-Quadratic Regulators via
Lagrangian Relaxation [107.06364966905821]
We study the exploration-exploitation dilemma in the linear quadratic regulator (LQR) setting.
Inspired by the extended value iteration algorithm used in optimistic algorithms for finite MDPs, we propose to relax the optimistic optimization of ofulq.
We show that an $epsilon$-optimistic controller can be computed efficiently by solving at most $Obig(log (1/epsilon)big)$ Riccati equations.
arXiv Detail & Related papers (2020-07-13T16:30:47Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z) - Naive Exploration is Optimal for Online LQR [49.681825576239355]
We show that the optimal regret scales as $widetildeTheta(sqrtd_mathbfu2 d_mathbfx T)$, where $T$ is the number of time steps, $d_mathbfu$ is the dimension of the input space, and $d_mathbfx$ is the dimension of the system state.
Our lower bounds rule out the possibility of a $mathrmpoly(logT)$-regret algorithm, which had been
arXiv Detail & Related papers (2020-01-27T03:44:54Z) - Improper Learning for Non-Stochastic Control [78.65807250350755]
We consider the problem of controlling a possibly unknown linear dynamical system with adversarial perturbations, adversarially chosen convex loss functions, and partially observed states.
Applying online descent to this parametrization yields a new controller which attains sublinear regret vs. a large class of closed-loop policies.
Our bounds are the first in the non-stochastic control setting that compete with emphall stabilizing linear dynamical controllers.
arXiv Detail & Related papers (2020-01-25T02:12:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.