Improper Learning for Non-Stochastic Control
- URL: http://arxiv.org/abs/2001.09254v3
- Date: Wed, 24 Jun 2020 23:48:03 GMT
- Title: Improper Learning for Non-Stochastic Control
- Authors: Max Simchowitz, Karan Singh, Elad Hazan
- Abstract summary: We consider the problem of controlling a possibly unknown linear dynamical system with adversarial perturbations, adversarially chosen convex loss functions, and partially observed states.
Applying online descent to this parametrization yields a new controller which attains sublinear regret vs. a large class of closed-loop policies.
Our bounds are the first in the non-stochastic control setting that compete with emphall stabilizing linear dynamical controllers.
- Score: 78.65807250350755
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of controlling a possibly unknown linear dynamical
system with adversarial perturbations, adversarially chosen convex loss
functions, and partially observed states, known as non-stochastic control. We
introduce a controller parametrization based on the denoised observations, and
prove that applying online gradient descent to this parametrization yields a
new controller which attains sublinear regret vs. a large class of closed-loop
policies. In the fully-adversarial setting, our controller attains an optimal
regret bound of $\sqrt{T}$-when the system is known, and, when combined with an
initial stage of least-squares estimation, $T^{2/3}$ when the system is
unknown; both yield the first sublinear regret for the partially observed
setting.
Our bounds are the first in the non-stochastic control setting that compete
with \emph{all} stabilizing linear dynamical controllers, not just state
feedback. Moreover, in the presence of semi-adversarial noise containing both
stochastic and adversarial components, our controller attains the optimal
regret bounds of $\mathrm{poly}(\log T)$ when the system is known, and
$\sqrt{T}$ when unknown. To our knowledge, this gives the first end-to-end
$\sqrt{T}$ regret for online Linear Quadratic Gaussian controller, and applies
in a more general setting with adversarial losses and semi-adversarial noise.
Related papers
- Learning Decentralized Linear Quadratic Regulators with $\sqrt{T}$ Regret [1.529943343419486]
We propose an online learning algorithm that adaptively designs a decentralized linear quadratic regulator when the system model is unknown a priori.
We show that our controller enjoys an expected regret that scales as $sqrtT$ with the time horizon $T$ for the case of partially nested information pattern.
arXiv Detail & Related papers (2022-10-17T09:29:01Z) - Finite-time System Identification and Adaptive Control in Autoregressive
Exogenous Systems [79.67879934935661]
We study the problem of system identification and adaptive control of unknown ARX systems.
We provide finite-time learning guarantees for the ARX systems under both open-loop and closed-loop data collection.
arXiv Detail & Related papers (2021-08-26T18:00:00Z) - Meta-Learning Guarantees for Online Receding Horizon Learning Control [0.0]
We provide provable regret guarantees for an online meta-learning receding horizon control algorithm in an iterative control setting.
We show that the worst regret for learning within an iteration improves with experience of more iterations.
arXiv Detail & Related papers (2020-10-21T21:57:04Z) - Learning Stabilizing Controllers for Unstable Linear Quadratic
Regulators from a Single Trajectory [85.29718245299341]
We study linear controllers under quadratic costs model also known as linear quadratic regulators (LQR)
We present two different semi-definite programs (SDP) which results in a controller that stabilizes all systems within an ellipsoid uncertainty set.
We propose an efficient data dependent algorithm -- textsceXploration -- that with high probability quickly identifies a stabilizing controller.
arXiv Detail & Related papers (2020-06-19T08:58:57Z) - Making Non-Stochastic Control (Almost) as Easy as Stochastic [27.736345095024276]
We show that same regret rate is attainable even in considerably more general non-stochastic control model.
We attain the optimal $widetildemathcalO(sqrtT)$ regret when the dynamics are unknown to the learner.
arXiv Detail & Related papers (2020-06-10T16:00:14Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z) - Regret Minimization in Partially Observable Linear Quadratic Control [91.43582419264763]
We study the problem of regret in partially observable linear quadratic control systems when the model dynamics are unknown a priori.
We propose a novel way to decompose the regret and provide an end-to-end sublinear regret upper bound for partially observable linear quadratic control.
arXiv Detail & Related papers (2020-01-31T22:35:08Z) - Naive Exploration is Optimal for Online LQR [49.681825576239355]
We show that the optimal regret scales as $widetildeTheta(sqrtd_mathbfu2 d_mathbfx T)$, where $T$ is the number of time steps, $d_mathbfu$ is the dimension of the input space, and $d_mathbfx$ is the dimension of the system state.
Our lower bounds rule out the possibility of a $mathrmpoly(logT)$-regret algorithm, which had been
arXiv Detail & Related papers (2020-01-27T03:44:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.