Black-Box Control for Linear Dynamical Systems
- URL: http://arxiv.org/abs/2007.06650v3
- Date: Wed, 17 Feb 2021 22:10:12 GMT
- Title: Black-Box Control for Linear Dynamical Systems
- Authors: Xinyi Chen, Elad Hazan
- Abstract summary: We consider the problem of controlling an unknown linear time-invariant dynamical system from a single chain of black-box interactions.
Under the assumption that the system is controllable, we give the first efficient algorithm that is capable of attaining sublinear regret.
- Score: 40.352938608995174
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of controlling an unknown linear time-invariant
dynamical system from a single chain of black-box interactions, with no access
to resets or offline simulation. Under the assumption that the system is
controllable, we give the first efficient algorithm that is capable of
attaining sublinear regret in a single trajectory under the setting of online
nonstochastic control. This resolves an open problem on the stochastic LQR
problem, and in a more challenging setting that allows for adversarial
perturbations and adversarially chosen and changing convex loss functions.
We give finite-time regret bounds for our algorithm on the order of
$2^{\tilde{O}(\mathcal{L})} + \tilde{O}(\text{poly}(\mathcal{L}) T^{2/3})$ for
general nonstochastic control, and $2^{\tilde{O}(\mathcal{L})} +
\tilde{O}(\text{poly}(\mathcal{L}) \sqrt{T})$ for black-box LQR, where
$\mathcal{L}$ is the system size which is an upper bound on the dimension. The
crucial step is a new system identification method that is robust to
adversarial noise, but incurs exponential cost.
To complete the picture, we investigate the complexity of the online
black-box control problem, and give a matching lower bound of
$2^{\Omega(\mathcal{L})}$ on the regret, showing that the additional
exponential cost is inevitable. This lower bound holds even in the noiseless
setting, and applies to any, randomized or deterministic, black-box control
method.
Related papers
- Optimal Dynamic Regret in LQR Control [23.91519151164528]
We consider the problem of nonstochastic control with a sequence of quadratic losses, i.e., LQR control.
We provide an online algorithm that achieves an optimal dynamic (policy) regret of $tildeO(textmaxn1/3 mathcalTV(M_1:n)2/3, 1)$.
arXiv Detail & Related papers (2022-06-18T18:00:21Z) - Dynamic Regret Minimization for Control of Non-stationary Linear
Dynamical Systems [18.783925692307054]
We present an algorithm that achieves the optimal dynamic regret of $tildemathcalO(sqrtST)$ where $S$ is the number of switches.
The crux of our algorithm is an adaptive non-stationarity detection strategy, which builds on an approach recently developed for contextual Multi-armed Bandit problems.
arXiv Detail & Related papers (2021-11-06T01:30:51Z) - Finite-time System Identification and Adaptive Control in Autoregressive
Exogenous Systems [79.67879934935661]
We study the problem of system identification and adaptive control of unknown ARX systems.
We provide finite-time learning guarantees for the ARX systems under both open-loop and closed-loop data collection.
arXiv Detail & Related papers (2021-08-26T18:00:00Z) - Efficient Optimistic Exploration in Linear-Quadratic Regulators via
Lagrangian Relaxation [107.06364966905821]
We study the exploration-exploitation dilemma in the linear quadratic regulator (LQR) setting.
Inspired by the extended value iteration algorithm used in optimistic algorithms for finite MDPs, we propose to relax the optimistic optimization of ofulq.
We show that an $epsilon$-optimistic controller can be computed efficiently by solving at most $Obig(log (1/epsilon)big)$ Riccati equations.
arXiv Detail & Related papers (2020-07-13T16:30:47Z) - Logarithmic Regret for Adversarial Online Control [56.12283443161479]
We give the first algorithm with logarithmic regret for arbitrary adversarial disturbance sequences.
Our algorithm and analysis use a characterization for the offline control law to reduce the online control problem to (delayed) online learning.
arXiv Detail & Related papers (2020-02-29T06:29:19Z) - Naive Exploration is Optimal for Online LQR [49.681825576239355]
We show that the optimal regret scales as $widetildeTheta(sqrtd_mathbfu2 d_mathbfx T)$, where $T$ is the number of time steps, $d_mathbfu$ is the dimension of the input space, and $d_mathbfx$ is the dimension of the system state.
Our lower bounds rule out the possibility of a $mathrmpoly(logT)$-regret algorithm, which had been
arXiv Detail & Related papers (2020-01-27T03:44:54Z) - Improper Learning for Non-Stochastic Control [78.65807250350755]
We consider the problem of controlling a possibly unknown linear dynamical system with adversarial perturbations, adversarially chosen convex loss functions, and partially observed states.
Applying online descent to this parametrization yields a new controller which attains sublinear regret vs. a large class of closed-loop policies.
Our bounds are the first in the non-stochastic control setting that compete with emphall stabilizing linear dynamical controllers.
arXiv Detail & Related papers (2020-01-25T02:12:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.