Regret Analysis of Policy Optimization over Submanifolds for Linearly
Constrained Online LQG
- URL: http://arxiv.org/abs/2403.08553v1
- Date: Wed, 13 Mar 2024 14:06:18 GMT
- Title: Regret Analysis of Policy Optimization over Submanifolds for Linearly
Constrained Online LQG
- Authors: Ting-Jui Chang and Shahin Shahrampour
- Abstract summary: We study online linear quadratic Gaussian problems with a given linear constraint imposed on the controller.
We propose online optimistic Newton on manifold (OONM) which provides an online controller based on the prediction on the first and second order information of the function sequence.
- Score: 12.201535821920624
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancement in online optimization and control has provided novel
tools to study online linear quadratic regulator (LQR) problems, where cost
matrices are varying adversarially over time. However, the controller
parameterization of existing works may not satisfy practical conditions like
sparsity due to physical connections. In this work, we study online linear
quadratic Gaussian problems with a given linear constraint imposed on the
controller. Inspired by the recent work of [1] which proposed, for a linearly
constrained policy optimization of an offline LQR, a second order method
equipped with a Riemannian metric that emerges naturally in the context of
optimal control problems, we propose online optimistic Newton on manifold
(OONM) which provides an online controller based on the prediction on the first
and second order information of the function sequence. To quantify the proposed
algorithm, we leverage the notion of regret defined as the sub-optimality of
its cumulative cost to that of a (locally) minimizing controller sequence and
provide the regret bound in terms of the path-length of the minimizer sequence.
Simulation results are also provided to verify the property of OONM.
Related papers
- Sub-linear Regret in Adaptive Model Predictive Control [56.705978425244496]
We present STT-MPC (Self-Tuning Tube-based Model Predictive Control), an online oracle that combines the certainty-equivalence principle and polytopic tubes.
We analyze the regret of the algorithm, when compared to an algorithm initially aware of the system dynamics.
arXiv Detail & Related papers (2023-10-07T15:07:10Z) - Primal-Dual Contextual Bayesian Optimization for Control System Online
Optimization with Time-Average Constraints [21.38692458445459]
This paper studies the problem of online performance optimization of constrained closed-loop control systems.
A primal-dual contextual Bayesian optimization algorithm is proposed that achieves sublinear cumulative regret with respect to the dynamic optimal solution.
arXiv Detail & Related papers (2023-04-12T18:37:52Z) - Regret Analysis of Online LQR Control via Trajectory Prediction and
Tracking: Extended Version [1.6344851071810074]
We propose and analyze a new method for online linear quadratic regulator (LQR) control with a priori unknown time-varying cost matrices.
Our novel method involves using the available cost matrices to predict the optimal trajectory, and a tracking controller to drive the system towards it.
We show in simulations that our proposed method offers improved performance compared to other previously proposed online LQR methods.
arXiv Detail & Related papers (2023-02-21T02:48:57Z) - Augmented Lagrangian Methods for Time-varying Constrained Online Convex
Optimization [1.662966122370634]
We consider online convex optimization (OCO) with time-varying loss and constraint functions.
We first develop a class of model-based augmented Lagrangian methods (MALM) for time-varying functional constrained OCO.
numerical results for several examples of constrained OCO are presented to demonstrate the efficiency of the proposed algorithms.
arXiv Detail & Related papers (2022-05-19T14:03:25Z) - Adaptivity and Non-stationarity: Problem-dependent Dynamic Regret for Online Convex Optimization [70.4342220499858]
We introduce novel online algorithms that can exploit smoothness and replace the dependence on $T$ in dynamic regret with problem-dependent quantities.
Our results are adaptive to the intrinsic difficulty of the problem, since the bounds are tighter than existing results for easy problems and safeguard the same rate in the worst case.
arXiv Detail & Related papers (2021-12-29T02:42:59Z) - Non-stationary Online Learning with Memory and Non-stochastic Control [71.14503310914799]
We study the problem of Online Convex Optimization (OCO) with memory, which allows loss functions to depend on past decisions.
In this paper, we introduce dynamic policy regret as the performance measure to design algorithms robust to non-stationary environments.
We propose a novel algorithm for OCO with memory that provably enjoys an optimal dynamic policy regret in terms of time horizon, non-stationarity measure, and memory length.
arXiv Detail & Related papers (2021-02-07T09:45:15Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z) - Logarithmic Regret for Adversarial Online Control [56.12283443161479]
We give the first algorithm with logarithmic regret for arbitrary adversarial disturbance sequences.
Our algorithm and analysis use a characterization for the offline control law to reduce the online control problem to (delayed) online learning.
arXiv Detail & Related papers (2020-02-29T06:29:19Z) - Improper Learning for Non-Stochastic Control [78.65807250350755]
We consider the problem of controlling a possibly unknown linear dynamical system with adversarial perturbations, adversarially chosen convex loss functions, and partially observed states.
Applying online descent to this parametrization yields a new controller which attains sublinear regret vs. a large class of closed-loop policies.
Our bounds are the first in the non-stochastic control setting that compete with emphall stabilizing linear dynamical controllers.
arXiv Detail & Related papers (2020-01-25T02:12:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.