Logarithmic Regret for Adversarial Online Control
- URL: http://arxiv.org/abs/2003.00189v3
- Date: Tue, 23 Jun 2020 08:17:44 GMT
- Title: Logarithmic Regret for Adversarial Online Control
- Authors: Dylan J. Foster and Max Simchowitz
- Abstract summary: We give the first algorithm with logarithmic regret for arbitrary adversarial disturbance sequences.
Our algorithm and analysis use a characterization for the offline control law to reduce the online control problem to (delayed) online learning.
- Score: 56.12283443161479
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a new algorithm for online linear-quadratic control in a known
system subject to adversarial disturbances. Existing regret bounds for this
setting scale as $\sqrt{T}$ unless strong stochastic assumptions are imposed on
the disturbance process. We give the first algorithm with logarithmic regret
for arbitrary adversarial disturbance sequences, provided the state and control
costs are given by known quadratic functions. Our algorithm and analysis use a
characterization for the optimal offline control law to reduce the online
control problem to (delayed) online learning with approximate advantage
functions. Compared to previous techniques, our approach does not need to
control movement costs for the iterates, leading to logarithmic regret.
Related papers
- Regret Analysis of Policy Optimization over Submanifolds for Linearly
Constrained Online LQG [12.201535821920624]
We study online linear quadratic Gaussian problems with a given linear constraint imposed on the controller.
We propose online optimistic Newton on manifold (OONM) which provides an online controller based on the prediction on the first and second order information of the function sequence.
arXiv Detail & Related papers (2024-03-13T14:06:18Z) - Sub-linear Regret in Adaptive Model Predictive Control [56.705978425244496]
We present STT-MPC (Self-Tuning Tube-based Model Predictive Control), an online oracle that combines the certainty-equivalence principle and polytopic tubes.
We analyze the regret of the algorithm, when compared to an algorithm initially aware of the system dynamics.
arXiv Detail & Related papers (2023-10-07T15:07:10Z) - Data-Driven Adversarial Online Control for Unknown Linear Systems [17.595231077524467]
We present a novel data-driven online adaptive control algorithm to address this online control problem.
Our algorithm guarantees an $tmO(T2/3)$ regret gradient bound with high probability, which matches the best-known regret bound for this problem.
arXiv Detail & Related papers (2023-08-16T04:05:22Z) - Safe Adaptive Learning-based Control for Constrained Linear Quadratic
Regulators with Regret Guarantees [11.627320138064684]
We study the adaptive control of an unknown linear system with a quadratic cost function subject to safety constraints on both the states and actions.
Our algorithm is implemented on a single trajectory and does not require system restarts.
arXiv Detail & Related papers (2021-10-31T05:52:42Z) - Online estimation and control with optimal pathlength regret [52.28457815067461]
A natural goal when designing online learning algorithms is to bound the regret of the algorithm in terms of the temporal variation of the input sequence.
Data-dependent "pathlength" regret bounds have recently been obtained for a wide variety of online learning problems, including OCO and bandits.
arXiv Detail & Related papers (2021-10-24T22:43:15Z) - Non-stationary Online Learning with Memory and Non-stochastic Control [71.14503310914799]
We study the problem of Online Convex Optimization (OCO) with memory, which allows loss functions to depend on past decisions.
In this paper, we introduce dynamic policy regret as the performance measure to design algorithms robust to non-stationary environments.
We propose a novel algorithm for OCO with memory that provably enjoys an optimal dynamic policy regret in terms of time horizon, non-stationarity measure, and memory length.
arXiv Detail & Related papers (2021-02-07T09:45:15Z) - Strongly Adaptive OCO with Memory [49.319621885036035]
We propose the first strongly adaptive algorithm for online learning with memory.
Our algorithm results in a strongly adaptive regret bound for the control of linear time-varying systems.
arXiv Detail & Related papers (2021-02-02T17:26:08Z) - Improper Learning for Non-Stochastic Control [78.65807250350755]
We consider the problem of controlling a possibly unknown linear dynamical system with adversarial perturbations, adversarially chosen convex loss functions, and partially observed states.
Applying online descent to this parametrization yields a new controller which attains sublinear regret vs. a large class of closed-loop policies.
Our bounds are the first in the non-stochastic control setting that compete with emphall stabilizing linear dynamical controllers.
arXiv Detail & Related papers (2020-01-25T02:12:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.