No-Regret Prediction in Marginally Stable Systems
- URL: http://arxiv.org/abs/2002.02064v3
- Date: Tue, 23 Jun 2020 19:08:58 GMT
- Title: No-Regret Prediction in Marginally Stable Systems
- Authors: Udaya Ghai, Holden Lee, Karan Singh, Cyril Zhang, Yi Zhang
- Abstract summary: We consider the problem of online prediction in a marginally stable linear dynamical system.
By applying our techniques to learning an autoregressive filter, we also achieve logarithmic regret in the partially observed setting.
- Score: 37.178095559618654
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of online prediction in a marginally stable linear
dynamical system subject to bounded adversarial or (non-isotropic) stochastic
perturbations. This poses two challenges. Firstly, the system is in general
unidentifiable, so recent and classical results on parameter recovery do not
apply. Secondly, because we allow the system to be marginally stable, the state
can grow polynomially with time; this causes standard regret bounds in online
convex optimization to be vacuous. In spite of these challenges, we show that
the online least-squares algorithm achieves sublinear regret (improvable to
polylogarithmic in the stochastic setting), with polynomial dependence on the
system's parameters. This requires a refined regret analysis, including a
structural lemma showing the current state of the system to be a small linear
combination of past states, even if the state grows polynomially. By applying
our techniques to learning an autoregressive filter, we also achieve
logarithmic regret in the partially observed setting under Gaussian noise, with
polynomial dependence on the memory of the associated Kalman filter.
Related papers
- Sub-linear Regret in Adaptive Model Predictive Control [56.705978425244496]
We present STT-MPC (Self-Tuning Tube-based Model Predictive Control), an online oracle that combines the certainty-equivalence principle and polytopic tubes.
We analyze the regret of the algorithm, when compared to an algorithm initially aware of the system dynamics.
arXiv Detail & Related papers (2023-10-07T15:07:10Z) - Stability Bounds for Learning-Based Adaptive Control of Discrete-Time
Multi-Dimensional Stochastic Linear Systems with Input Constraints [3.8004168340068336]
We consider the problem of adaptive stabilization for discrete-time, multi-dimensional systems with bounded control input constraints and unbounded disturbances.
We propose a certainty-equivalent control scheme which combines online parameter estimation with saturated linear control.
arXiv Detail & Related papers (2023-04-02T16:38:13Z) - Regret Analysis of Certainty Equivalence Policies in Continuous-Time
Linear-Quadratic Systems [0.0]
This work studies theoretical performance guarantees of a ubiquitous reinforcement learning policy for controlling the canonical model of linear-quadratic system.
We establish square-root of time regret bounds, indicating that randomized certainty equivalent policy learns optimal control actions fast from a single state trajectory.
arXiv Detail & Related papers (2022-06-09T11:47:36Z) - Online Control of Unknown Time-Varying Dynamical Systems [48.75672260851758]
We study online control of time-varying linear systems with unknown dynamics in the nonstochastic control model.
We study regret bounds with respect to common classes of policies: Disturbance Action (SLS), Disturbance Response (Youla), and linear feedback policies.
arXiv Detail & Related papers (2022-02-16T06:57:14Z) - Online estimation and control with optimal pathlength regret [52.28457815067461]
A natural goal when designing online learning algorithms is to bound the regret of the algorithm in terms of the temporal variation of the input sequence.
Data-dependent "pathlength" regret bounds have recently been obtained for a wide variety of online learning problems, including OCO and bandits.
arXiv Detail & Related papers (2021-10-24T22:43:15Z) - Reinforcement Learning Policies in Continuous-Time Linear Systems [0.0]
We present online policies that learn optimal actions fast by carefully randomizing the parameter estimates.
We prove sharp stability results for inexact system dynamics and tightly specify the infinitesimal regret caused by sub-optimal actions.
Our analysis sheds light on fundamental challenges in continuous-time reinforcement learning and suggests a useful cornerstone for similar problems.
arXiv Detail & Related papers (2021-09-16T00:08:50Z) - Sparse Identification of Nonlinear Dynamical Systems via Reweighted
$\ell_1$-regularized Least Squares [62.997667081978825]
This work proposes an iterative sparse-regularized regression method to recover governing equations of nonlinear systems from noisy state measurements.
The aim of this work is to improve the accuracy and robustness of the method in the presence of state measurement noise.
arXiv Detail & Related papers (2020-05-27T08:30:15Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z) - Online Learning of the Kalman Filter with Logarithmic Regret [2.0305676256390934]
We show that it is possible to achieve a regret of the order of $mathrmpolylog(N)$ with high probability, where $N$ is the number of observations collected.
This is achieved using an online least-squares algorithm, which exploits the approximately linear relation between future observations and past observations.
arXiv Detail & Related papers (2020-02-12T18:31:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.