Predictive Linear Online Tracking for Unknown Targets
- URL: http://arxiv.org/abs/2402.10036v3
- Date: Thu, 13 Jun 2024 13:04:41 GMT
- Title: Predictive Linear Online Tracking for Unknown Targets
- Authors: Anastasios Tsiamis, Aren Karapetyan, Yueshan Li, Efe C. Balta, John Lygeros,
- Abstract summary: We study the problem of online tracking in linear control systems, where the objective is to follow a moving target.
We propose a new algorithm, called predictive linear online tracking (PLOT)
We implement PLOT on a real quadrotor and provide open-source software, thus, showcasing one of the first successful applications of online control methods on real hardware.
- Score: 5.047136039782827
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we study the problem of online tracking in linear control systems, where the objective is to follow a moving target. Unlike classical tracking control, the target is unknown, non-stationary, and its state is revealed sequentially, thus, fitting the framework of online non-stochastic control. We consider the case of quadratic costs and propose a new algorithm, called predictive linear online tracking (PLOT). The algorithm uses recursive least squares with exponential forgetting to learn a time-varying dynamic model of the target. The learned model is used in the optimal policy under the framework of receding horizon control. We show the dynamic regret of PLOT scales with $\mathcal{O}(\sqrt{TV_T})$, where $V_T$ is the total variation of the target dynamics and $T$ is the time horizon. Unlike prior work, our theoretical results hold for non-stationary targets. We implement PLOT on a real quadrotor and provide open-source software, thus, showcasing one of the first successful applications of online control methods on real hardware.
Related papers
- Learning to Control under Time-Varying Environment [18.48729114775298]
This paper investigates the problem of regret in linear time-varying (LTV) dynamical systems.
We propose the first computationally tractable online algorithm with regret guarantees.
arXiv Detail & Related papers (2022-06-06T11:40:46Z) - Finite-time System Identification and Adaptive Control in Autoregressive
Exogenous Systems [79.67879934935661]
We study the problem of system identification and adaptive control of unknown ARX systems.
We provide finite-time learning guarantees for the ARX systems under both open-loop and closed-loop data collection.
arXiv Detail & Related papers (2021-08-26T18:00:00Z) - Regret Analysis of Distributed Online LQR Control for Unknown LTI
Systems [8.832969171530056]
We study the distributed online linear quadratic regulator (LQR) problem for linear time-invariant (LTI) systems with unknown dynamics.
We propose a distributed variant of the online LQR algorithm where each agent computes its system estimate during an exploration stage.
We prove that our proposed algorithm scales $tildeO(T2/3)$, implying the consensus of the network over time.
arXiv Detail & Related papers (2021-05-15T23:02:58Z) - Online Policy Gradient for Model Free Learning of Linear Quadratic
Regulators with $\sqrt{T}$ Regret [0.0]
We present the first model-free algorithm that achieves similar regret guarantees.
Our method relies on an efficient policy gradient scheme, and a novel and tighter analysis of the cost of exploration in policy space.
arXiv Detail & Related papers (2021-02-25T00:25:41Z) - Non-stationary Online Learning with Memory and Non-stochastic Control [71.14503310914799]
We study the problem of Online Convex Optimization (OCO) with memory, which allows loss functions to depend on past decisions.
In this paper, we introduce dynamic policy regret as the performance measure to design algorithms robust to non-stationary environments.
We propose a novel algorithm for OCO with memory that provably enjoys an optimal dynamic policy regret in terms of time horizon, non-stationarity measure, and memory length.
arXiv Detail & Related papers (2021-02-07T09:45:15Z) - Online Markov Decision Processes with Aggregate Bandit Feedback [74.85532145498742]
We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics.
In each episode, the learner suffers the loss accumulated along the trajectory realized by the policy chosen for the episode, and observes aggregate bandit feedback.
Our main result is a computationally efficient algorithm with $O(sqrtK)$ regret for this setting, where $K$ is the number of episodes.
arXiv Detail & Related papers (2021-01-31T16:49:07Z) - Breaking the Deadly Triad with a Target Network [80.82586530205776]
The deadly triad refers to the instability of a reinforcement learning algorithm when it employs off-policy learning, function approximation, and bootstrapping simultaneously.
We provide the first convergent linear $Q$-learning algorithms under nonrestrictive and changing behavior policies without bi-level optimization.
arXiv Detail & Related papers (2021-01-21T21:50:10Z) - Robust Quadrupedal Locomotion on Sloped Terrains: A Linear Policy
Approach [3.752600874088677]
We use a linear policy for realizing end-foot trajectories in the quadruped robot, Stoch $2$.
In particular, the parameters of the end-foot trajectories are shaped via a linear feedback policy that takes the torso orientation and the terrain slope as inputs.
The resulting walking is robust to terrain slope variations and external pushes.
arXiv Detail & Related papers (2020-10-30T16:02:08Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z) - Logarithmic Regret for Adversarial Online Control [56.12283443161479]
We give the first algorithm with logarithmic regret for arbitrary adversarial disturbance sequences.
Our algorithm and analysis use a characterization for the offline control law to reduce the online control problem to (delayed) online learning.
arXiv Detail & Related papers (2020-02-29T06:29:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.