Geometric Exploration for Online Control
- URL: http://arxiv.org/abs/2010.13178v2
- Date: Thu, 29 Oct 2020 12:19:11 GMT
- Title: Geometric Exploration for Online Control
- Authors: Orestis Plevrakis and Elad Hazan
- Abstract summary: We study the control of an emphunknown linear dynamical system under general convex costs.
The objective is minimizing regret vs. the class of disturbance-feedback-controllers.
- Score: 38.87811800375421
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the control of an \emph{unknown} linear dynamical system under
general convex costs. The objective is minimizing regret vs. the class of
disturbance-feedback-controllers, which encompasses all stabilizing
linear-dynamical-controllers. In this work, we first consider the case of known
cost functions, for which we design the first polynomial-time algorithm with
$n^3\sqrt{T}$-regret, where $n$ is the dimension of the state plus the
dimension of control input. The $\sqrt{T}$-horizon dependence is optimal, and
improves upon the previous best known bound of $T^{2/3}$. The main component of
our algorithm is a novel geometric exploration strategy: we adaptively
construct a sequence of barycentric spanners in the policy space. Second, we
consider the case of bandit feedback, for which we give the first
polynomial-time algorithm with $poly(n)\sqrt{T}$-regret, building on Stochastic
Bandit Convex Optimization.
Related papers
- Revisiting Weighted Strategy for Non-stationary Parametric Bandits [82.1942459195896]
This paper revisits the weighted strategy for non-stationary parametric bandits.
We propose a refined analysis framework, which produces a simpler weight-based algorithm.
Our new framework can be used to improve regret bounds of other parametric bandits.
arXiv Detail & Related papers (2023-03-05T15:11:14Z) - Corralling a Larger Band of Bandits: A Case Study on Switching Regret
for Linear Bandits [99.86860277006318]
We consider the problem of combining and learning over a set of adversarial algorithms with the goal of adaptively tracking the best one on the fly.
The CORRAL of Agarwal et al. achieves this goal with a regret overhead of order $widetildeO(sqrtd S T)$ where $M$ is the number of base algorithms and $T$ is the time horizon.
Motivated by this issue, we propose a new recipe to corral a larger band of bandit algorithms whose regret overhead has only emphlogarithmic dependence on $M$ as long
arXiv Detail & Related papers (2022-02-12T21:55:44Z) - Dynamic Regret Minimization for Control of Non-stationary Linear
Dynamical Systems [18.783925692307054]
We present an algorithm that achieves the optimal dynamic regret of $tildemathcalO(sqrtST)$ where $S$ is the number of switches.
The crux of our algorithm is an adaptive non-stationarity detection strategy, which builds on an approach recently developed for contextual Multi-armed Bandit problems.
arXiv Detail & Related papers (2021-11-06T01:30:51Z) - Minimal Expected Regret in Linear Quadratic Control [79.81807680370677]
We devise an online learning algorithm and provide guarantees on its expected regret.
This regret at time $T$ is upper bounded (i) by $widetildeO((d_u+d_x)sqrtd_xT)$ when $A$ and $B$ are unknown.
arXiv Detail & Related papers (2021-09-29T14:07:21Z) - Naive Exploration is Optimal for Online LQR [49.681825576239355]
We show that the optimal regret scales as $widetildeTheta(sqrtd_mathbfu2 d_mathbfx T)$, where $T$ is the number of time steps, $d_mathbfu$ is the dimension of the input space, and $d_mathbfx$ is the dimension of the system state.
Our lower bounds rule out the possibility of a $mathrmpoly(logT)$-regret algorithm, which had been
arXiv Detail & Related papers (2020-01-27T03:44:54Z) - Improper Learning for Non-Stochastic Control [78.65807250350755]
We consider the problem of controlling a possibly unknown linear dynamical system with adversarial perturbations, adversarially chosen convex loss functions, and partially observed states.
Applying online descent to this parametrization yields a new controller which attains sublinear regret vs. a large class of closed-loop policies.
Our bounds are the first in the non-stochastic control setting that compete with emphall stabilizing linear dynamical controllers.
arXiv Detail & Related papers (2020-01-25T02:12:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.