Related papers: Naive Exploration is Optimal for Online LQR

Naive Exploration is Optimal for Online LQR

URL: http://arxiv.org/abs/2001.09576v4
Date: Wed, 4 Oct 2023 02:22:31 GMT
Title: Naive Exploration is Optimal for Online LQR
Authors: Max Simchowitz, Dylan J. Foster
Abstract summary: We show that the optimal regret scales as $widetildeTheta(sqrtd_mathbfu2 d_mathbfx T)$, where $T$ is the number of time steps, $d_mathbfu$ is the dimension of the input space, and $d_mathbfx$ is the dimension of the system state. Our lower bounds rule out the possibility of a $mathrmpoly(logT)$-regret algorithm, which had been
Score: 49.681825576239355
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of online adaptive control of the linear quadratic regulator, where the true system parameters are unknown. We prove new upper and lower bounds demonstrating that the optimal regret scales as $\widetilde{\Theta}({\sqrt{d_{\mathbf{u}}^2 d_{\mathbf{x}} T}})$, where $T$ is the number of time steps, $d_{\mathbf{u}}$ is the dimension of the input space, and $d_{\mathbf{x}}$ is the dimension of the system state. Notably, our lower bounds rule out the possibility of a $\mathrm{poly}(\log{}T)$-regret algorithm, which had been conjectured due to the apparent strong convexity of the problem. Our upper bound is attained by a simple variant of $\textit{{certainty equivalent control}}$, where the learner selects control inputs according to the optimal controller for their estimate of the system while injecting exploratory random noise. While this approach was shown to achieve $\sqrt{T}$-regret by (Mania et al. 2019), we show that if the learner continually refines their estimates of the system matrices, the method attains optimal dimension dependence as well. Central to our upper and lower bounds is a new approach for controlling perturbations of Riccati equations called the $\textit{self-bounding ODE method}$, which we use to derive suboptimality bounds for the certainty equivalent controller synthesized from estimated system dynamics. This in turn enables regret upper bounds which hold for $\textit{any stabilizable instance}$ and scale with natural control-theoretic quantities.

Related papers

Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel. An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z)
Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems [61.85150061213987]
We study the generalized low-rank matrix bandit problem, proposed in citelu2021low under the Generalized Linear Model (GLM) framework. To overcome the computational infeasibility and theoretical restrain of existing algorithms, we first propose the G-ESTT framework. We show that G-ESTT can achieve the $tildeO(sqrt(d_1+d_2)3/2Mr3/2T)$ bound of regret while G-ESTS can achineve the $tildeO
arXiv Detail & Related papers (2024-01-14T14:14:19Z)
Dynamic Regret Minimization for Control of Non-stationary Linear Dynamical Systems [18.783925692307054]
We present an algorithm that achieves the optimal dynamic regret of $tildemathcalO(sqrtST)$ where $S$ is the number of switches. The crux of our algorithm is an adaptive non-stationarity detection strategy, which builds on an approach recently developed for contextual Multi-armed Bandit problems.
arXiv Detail & Related papers (2021-11-06T01:30:51Z)
Minimal Expected Regret in Linear Quadratic Control [79.81807680370677]
We devise an online learning algorithm and provide guarantees on its expected regret. This regret at time $T$ is upper bounded (i) by $widetildeO((d_u+d_x)sqrtd_xT)$ when $A$ and $B$ are unknown.
arXiv Detail & Related papers (2021-09-29T14:07:21Z)
Regret Lower Bound and Optimal Algorithm for High-Dimensional Contextual Linear Bandit [10.604939762790517]
We prove a minimax lower bound, $mathcalObig((log d)fracalpha+12Tfrac1-alpha2+log Tbig)$, for the cumulative regret. Second, we propose a simple and computationally efficient algorithm inspired by the general Upper Confidence Bound (UCB) strategy.
arXiv Detail & Related papers (2021-09-23T19:35:38Z)
Geometric Exploration for Online Control [38.87811800375421]
We study the control of an emphunknown linear dynamical system under general convex costs. The objective is minimizing regret vs. the class of disturbance-feedback-controllers.
arXiv Detail & Related papers (2020-10-25T18:11:28Z)
Black-Box Control for Linear Dynamical Systems [40.352938608995174]
We consider the problem of controlling an unknown linear time-invariant dynamical system from a single chain of black-box interactions. Under the assumption that the system is controllable, we give the first efficient algorithm that is capable of attaining sublinear regret.
arXiv Detail & Related papers (2020-07-13T19:43:19Z)
Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies. We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
Improper Learning for Non-Stochastic Control [78.65807250350755]
We consider the problem of controlling a possibly unknown linear dynamical system with adversarial perturbations, adversarially chosen convex loss functions, and partially observed states. Applying online descent to this parametrization yields a new controller which attains sublinear regret vs. a large class of closed-loop policies. Our bounds are the first in the non-stochastic control setting that compete with emphall stabilizing linear dynamical controllers.
arXiv Detail & Related papers (2020-01-25T02:12:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.