Naive Exploration is Optimal for Online LQR
- URL: http://arxiv.org/abs/2001.09576v4
- Date: Wed, 4 Oct 2023 02:22:31 GMT
- Title: Naive Exploration is Optimal for Online LQR
- Authors: Max Simchowitz, Dylan J. Foster
- Abstract summary: We show that the optimal regret scales as $widetildeTheta(sqrtd_mathbfu2 d_mathbfx T)$, where $T$ is the number of time steps, $d_mathbfu$ is the dimension of the input space, and $d_mathbfx$ is the dimension of the system state.
Our lower bounds rule out the possibility of a $mathrmpoly(logT)$-regret algorithm, which had been
- Score: 49.681825576239355
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of online adaptive control of the linear quadratic
regulator, where the true system parameters are unknown. We prove new upper and
lower bounds demonstrating that the optimal regret scales as
$\widetilde{\Theta}({\sqrt{d_{\mathbf{u}}^2 d_{\mathbf{x}} T}})$, where $T$ is
the number of time steps, $d_{\mathbf{u}}$ is the dimension of the input space,
and $d_{\mathbf{x}}$ is the dimension of the system state. Notably, our lower
bounds rule out the possibility of a $\mathrm{poly}(\log{}T)$-regret algorithm,
which had been conjectured due to the apparent strong convexity of the problem.
Our upper bound is attained by a simple variant of $\textit{{certainty
equivalent control}}$, where the learner selects control inputs according to
the optimal controller for their estimate of the system while injecting
exploratory random noise. While this approach was shown to achieve
$\sqrt{T}$-regret by (Mania et al. 2019), we show that if the learner
continually refines their estimates of the system matrices, the method attains
optimal dimension dependence as well.
Central to our upper and lower bounds is a new approach for controlling
perturbations of Riccati equations called the $\textit{self-bounding ODE
method}$, which we use to derive suboptimality bounds for the certainty
equivalent controller synthesized from estimated system dynamics. This in turn
enables regret upper bounds which hold for $\textit{any stabilizable instance}$
and scale with natural control-theoretic quantities.
Related papers
- Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems [61.85150061213987]
We study the generalized low-rank matrix bandit problem, proposed in citelu2021low under the Generalized Linear Model (GLM) framework.
To overcome the computational infeasibility and theoretical restrain of existing algorithms, we first propose the G-ESTT framework.
We show that G-ESTT can achieve the $tildeO(sqrt(d_1+d_2)3/2Mr3/2T)$ bound of regret while G-ESTS can achineve the $tildeO
arXiv Detail & Related papers (2024-01-14T14:14:19Z) - Dynamic Regret Minimization for Control of Non-stationary Linear
Dynamical Systems [18.783925692307054]
We present an algorithm that achieves the optimal dynamic regret of $tildemathcalO(sqrtST)$ where $S$ is the number of switches.
The crux of our algorithm is an adaptive non-stationarity detection strategy, which builds on an approach recently developed for contextual Multi-armed Bandit problems.
arXiv Detail & Related papers (2021-11-06T01:30:51Z) - Minimal Expected Regret in Linear Quadratic Control [79.81807680370677]
We devise an online learning algorithm and provide guarantees on its expected regret.
This regret at time $T$ is upper bounded (i) by $widetildeO((d_u+d_x)sqrtd_xT)$ when $A$ and $B$ are unknown.
arXiv Detail & Related papers (2021-09-29T14:07:21Z) - Regret Lower Bound and Optimal Algorithm for High-Dimensional Contextual
Linear Bandit [10.604939762790517]
We prove a minimax lower bound, $mathcalObig((log d)fracalpha+12Tfrac1-alpha2+log Tbig)$, for the cumulative regret.
Second, we propose a simple and computationally efficient algorithm inspired by the general Upper Confidence Bound (UCB) strategy.
arXiv Detail & Related papers (2021-09-23T19:35:38Z) - Geometric Exploration for Online Control [38.87811800375421]
We study the control of an emphunknown linear dynamical system under general convex costs.
The objective is minimizing regret vs. the class of disturbance-feedback-controllers.
arXiv Detail & Related papers (2020-10-25T18:11:28Z) - Black-Box Control for Linear Dynamical Systems [40.352938608995174]
We consider the problem of controlling an unknown linear time-invariant dynamical system from a single chain of black-box interactions.
Under the assumption that the system is controllable, we give the first efficient algorithm that is capable of attaining sublinear regret.
arXiv Detail & Related papers (2020-07-13T19:43:19Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z) - Improper Learning for Non-Stochastic Control [78.65807250350755]
We consider the problem of controlling a possibly unknown linear dynamical system with adversarial perturbations, adversarially chosen convex loss functions, and partially observed states.
Applying online descent to this parametrization yields a new controller which attains sublinear regret vs. a large class of closed-loop policies.
Our bounds are the first in the non-stochastic control setting that compete with emphall stabilizing linear dynamical controllers.
arXiv Detail & Related papers (2020-01-25T02:12:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.