Online Control of Unknown Time-Varying Dynamical Systems
- URL: http://arxiv.org/abs/2202.07890v1
- Date: Wed, 16 Feb 2022 06:57:14 GMT
- Title: Online Control of Unknown Time-Varying Dynamical Systems
- Authors: Edgar Minasyan, Paula Gradu, Max Simchowitz, Elad Hazan
- Abstract summary: We study online control of time-varying linear systems with unknown dynamics in the nonstochastic control model.
We study regret bounds with respect to common classes of policies: Disturbance Action (SLS), Disturbance Response (Youla), and linear feedback policies.
- Score: 48.75672260851758
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study online control of time-varying linear systems with unknown dynamics
in the nonstochastic control model. At a high level, we demonstrate that this
setting is \emph{qualitatively harder} than that of either unknown
time-invariant or known time-varying dynamics, and complement our negative
results with algorithmic upper bounds in regimes where sublinear regret is
possible. More specifically, we study regret bounds with respect to common
classes of policies: Disturbance Action (SLS), Disturbance Response (Youla),
and linear feedback policies. While these three classes are essentially
equivalent for LTI systems, we demonstrate that these equivalences break down
for time-varying systems.
We prove a lower bound that no algorithm can obtain sublinear regret with
respect to the first two classes unless a certain measure of system variability
also scales sublinearly in the horizon. Furthermore, we show that offline
planning over the state linear feedback policies is NP-hard, suggesting
hardness of the online learning problem.
On the positive side, we give an efficient algorithm that attains a sublinear
regret bound against the class of Disturbance Response policies up to the
aforementioned system variability term. In fact, our algorithm enjoys sublinear
\emph{adaptive} regret bounds, which is a strictly stronger metric than
standard regret and is more appropriate for time-varying systems. We sketch
extensions to Disturbance Action policies and partial observation, and propose
an inefficient algorithm for regret against linear state feedback policies.
Related papers
- Non-asymptotic System Identification for Linear Systems with Nonlinear
Policies [17.420749574975368]
This paper considers a single-trajectory system identification problem for linear systems under general nonlinear and/or time-varying policies.
We provide a non-asymptotic error bound for least square estimation when the data trajectory is generated by any nonlinear and/or time-varying policies.
arXiv Detail & Related papers (2023-06-17T15:05:59Z) - Best of Both Worlds in Online Control: Competitive Ratio and Policy
Regret [61.59646565655169]
We show that several recently proposed online control algorithms achieve the best of both worlds: sublinear regret vs. the best DAC policy selected in hindsight.
We conclude that sublinear regret vs. the optimal competitive policy is attainable when the linear dynamical system is unknown.
arXiv Detail & Related papers (2022-11-21T07:29:08Z) - Implications of Regret on Stability of Linear Dynamical Systems [5.6435410094272696]
In online learning, the quality of an agent's decision is often quantified by the concept of regret.
We show that for linear state feedback policies and linear systems subject to adversarial disturbances, linear regret implies stability in both time-varying and time-invariant settings.
arXiv Detail & Related papers (2022-11-14T14:39:22Z) - Regret Analysis of Certainty Equivalence Policies in Continuous-Time
Linear-Quadratic Systems [0.0]
This work studies theoretical performance guarantees of a ubiquitous reinforcement learning policy for controlling the canonical model of linear-quadratic system.
We establish square-root of time regret bounds, indicating that randomized certainty equivalent policy learns optimal control actions fast from a single state trajectory.
arXiv Detail & Related papers (2022-06-09T11:47:36Z) - Online estimation and control with optimal pathlength regret [52.28457815067461]
A natural goal when designing online learning algorithms is to bound the regret of the algorithm in terms of the temporal variation of the input sequence.
Data-dependent "pathlength" regret bounds have recently been obtained for a wide variety of online learning problems, including OCO and bandits.
arXiv Detail & Related papers (2021-10-24T22:43:15Z) - Non-stationary Online Learning with Memory and Non-stochastic Control [71.14503310914799]
We study the problem of Online Convex Optimization (OCO) with memory, which allows loss functions to depend on past decisions.
In this paper, we introduce dynamic policy regret as the performance measure to design algorithms robust to non-stationary environments.
We propose a novel algorithm for OCO with memory that provably enjoys an optimal dynamic policy regret in terms of time horizon, non-stationarity measure, and memory length.
arXiv Detail & Related papers (2021-02-07T09:45:15Z) - Adaptive Regret for Control of Time-Varying Dynamics [31.319502238224334]
We introduce the metric of it adaptive regret to the field of control.
Our main contribution is a novel efficient meta-algorithm.
The main technical innovation is the first adaptive regret bound for the more general framework of online convex optimization with memory.
arXiv Detail & Related papers (2020-07-08T19:40:34Z) - Logarithmic Regret Bound in Partially Observable Linear Dynamical
Systems [91.43582419264763]
We study the problem of system identification and adaptive control in partially observable linear dynamical systems.
We present the first model estimation method with finite-time guarantees in both open and closed-loop system identification.
We show that AdaptOn is the first algorithm that achieves $textpolylogleft(Tright)$ regret in adaptive control of unknown partially observable linear dynamical systems.
arXiv Detail & Related papers (2020-03-25T06:00:33Z) - Logarithmic Regret for Adversarial Online Control [56.12283443161479]
We give the first algorithm with logarithmic regret for arbitrary adversarial disturbance sequences.
Our algorithm and analysis use a characterization for the offline control law to reduce the online control problem to (delayed) online learning.
arXiv Detail & Related papers (2020-02-29T06:29:19Z) - No-Regret Prediction in Marginally Stable Systems [37.178095559618654]
We consider the problem of online prediction in a marginally stable linear dynamical system.
By applying our techniques to learning an autoregressive filter, we also achieve logarithmic regret in the partially observed setting.
arXiv Detail & Related papers (2020-02-06T01:53:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.