Fully Adaptive Regret-Guaranteed Algorithm for Control of Linear Quadratic Systems
- URL: http://arxiv.org/abs/2406.07746v1
- Date: Tue, 11 Jun 2024 22:04:59 GMT
- Title: Fully Adaptive Regret-Guaranteed Algorithm for Control of Linear Quadratic Systems
- Authors: Jafar Abbaszadeh Chekan, Cedric Langbort,
- Abstract summary: First algorithm for Linear Quadratic (LQ) control problem features a regret of $mathcalO(sqrtT)$
We propose the first fully adaptive algorithm that controls the number of policy updates (i.e. tunes the exploration-exploitation trade-off)
We show that through careful exploration-exploitation trade-off adjustment there is no need to commit to the widely-used notion of strong sequential stability.
- Score: 0.2455468619225742
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The first algorithm for the Linear Quadratic (LQ) control problem with an unknown system model, featuring a regret of $\mathcal{O}(\sqrt{T})$, was introduced by Abbasi-Yadkori and Szepesv\'ari (2011). Recognizing the computational complexity of this algorithm, subsequent efforts (see Cohen et al. (2019), Mania et al. (2019), Faradonbeh et al. (2020a), and Kargin et al.(2022)) have been dedicated to proposing algorithms that are computationally tractable while preserving this order of regret. Although successful, the existing works in the literature lack a fully adaptive exploration-exploitation trade-off adjustment and require a user-defined value, which can lead to overall regret bound growth with some factors. In this work, noticing this gap, we propose the first fully adaptive algorithm that controls the number of policy updates (i.e., tunes the exploration-exploitation trade-off) and optimizes the upper-bound of regret adaptively. Our proposed algorithm builds on the SDP-based approach of Cohen et al. (2019) and relaxes its need for a horizon-dependant warm-up phase by appropriately tuning the regularization parameter and adding an adaptive input perturbation. We further show that through careful exploration-exploitation trade-off adjustment there is no need to commit to the widely-used notion of strong sequential stability, which is restrictive and can introduce complexities in initialization.
Related papers
- Sub-linear Regret in Adaptive Model Predictive Control [56.705978425244496]
We present STT-MPC (Self-Tuning Tube-based Model Predictive Control), an online oracle that combines the certainty-equivalence principle and polytopic tubes.
We analyze the regret of the algorithm, when compared to an algorithm initially aware of the system dynamics.
arXiv Detail & Related papers (2023-10-07T15:07:10Z) - Global Convergence of Receding-Horizon Policy Search in Learning
Estimator Designs [3.0811185425377743]
We introduce the receding-horizon policy estimator (RHPG) algorithm.
RHPG is the first algorithm with provable global convergence in learning optimal linear policy estimator.
arXiv Detail & Related papers (2023-09-09T16:03:49Z) - Refined Regret for Adversarial MDPs with Linear Function Approximation [50.00022394876222]
We consider learning in an adversarial Decision Process (MDP) where the loss functions can change arbitrarily over $K$ episodes.
This paper provides two algorithms that improve the regret to $tildemathcal O(K2/3)$ in the same setting.
arXiv Detail & Related papers (2023-01-30T14:37:21Z) - Projection-free Adaptive Regret with Membership Oracles [31.422532403048738]
Most iterative algorithms require the computation of projections onto convex sets, which can be computationally expensive.
Recent work by GK22 gave sublinear adaptive regret guarantees with projection free algorithms based on the Frank Wolfe approach.
We give projection-free algorithms that are based on a different technique, inspired by Mhammedi22, that replaces projections by set-membership computations.
arXiv Detail & Related papers (2022-11-22T23:53:06Z) - ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive
Non-Stationary Dueling Bandits [20.128001589147512]
We study the problem of non-stationary dueling bandits and provide the first adaptive dynamic regret algorithm for this problem.
We show a near-optimal $tildeO(sqrtStexttCW T)$ dynamic regret bound, where $StexttCW$ is the number of times the Condorcet winner changes in $T$ rounds.
arXiv Detail & Related papers (2022-10-25T20:26:02Z) - Planning and Learning with Adaptive Lookahead [74.39132848733847]
Policy Iteration (PI) algorithm alternates between greedy one-step policy improvement and policy evaluation.
Recent literature shows that multi-step lookahead policy improvement leads to a better convergence rate at the expense of increased complexity per iteration.
We propose for the first time to dynamically adapt the multi-step lookahead horizon as a function of the state and of the value estimate.
arXiv Detail & Related papers (2022-01-28T20:26:55Z) - Finite-time System Identification and Adaptive Control in Autoregressive
Exogenous Systems [79.67879934935661]
We study the problem of system identification and adaptive control of unknown ARX systems.
We provide finite-time learning guarantees for the ARX systems under both open-loop and closed-loop data collection.
arXiv Detail & Related papers (2021-08-26T18:00:00Z) - An Efficient Algorithm for Cooperative Semi-Bandits [0.0]
We introduce Coop-FTPL, a cooperative version of the well-known Follow The Perturbed Leader algorithm.
We show that the expected regret of our algorithm after T time steps is of order QT log(k)(k$alpha$ 1 /Q + m), where Q is the total activation probability mass.
arXiv Detail & Related papers (2020-10-05T07:08:26Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.