Best of Both Worlds in Online Control: Competitive Ratio and Policy
Regret
- URL: http://arxiv.org/abs/2211.11219v1
- Date: Mon, 21 Nov 2022 07:29:08 GMT
- Title: Best of Both Worlds in Online Control: Competitive Ratio and Policy
Regret
- Authors: Gautam Goel, Naman Agarwal, Karan Singh, Elad Hazan
- Abstract summary: We show that several recently proposed online control algorithms achieve the best of both worlds: sublinear regret vs. the best DAC policy selected in hindsight.
We conclude that sublinear regret vs. the optimal competitive policy is attainable when the linear dynamical system is unknown.
- Score: 61.59646565655169
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the fundamental problem of online control of a linear dynamical
system from two different viewpoints: regret minimization and competitive
analysis. We prove that the optimal competitive policy is well-approximated by
a convex parameterized policy class, known as a disturbance-action control
(DAC) policies. Using this structural result, we show that several recently
proposed online control algorithms achieve the best of both worlds: sublinear
regret vs. the best DAC policy selected in hindsight, and optimal competitive
ratio, up to an additive correction which grows sublinearly in the time
horizon. We further conclude that sublinear regret vs. the optimal competitive
policy is attainable when the linear dynamical system is unknown, and even when
a stabilizing controller for the dynamics is not available a priori.
Related papers
- Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online
Reinforcement Learning [71.02384943570372]
Family Offline-to-Online RL (FamO2O) is a framework that empowers existing algorithms to determine state-adaptive improvement-constraint balances.
FamO2O offers a statistically significant improvement over various existing methods, achieving state-of-the-art performance on the D4RL benchmark.
arXiv Detail & Related papers (2023-10-27T08:30:54Z) - Online Nonstochastic Model-Free Reinforcement Learning [35.377261344335736]
We investigate robust model robustness guarantees for environments that may be dynamic or adversarial.
We provide efficient and efficient algorithms for optimizing these policies.
These are the best-known developments in having no dependence on the state-space dimension in having no dependence on the state-space.
arXiv Detail & Related papers (2023-05-27T19:02:55Z) - Introduction to Online Nonstochastic Control [34.77535508151501]
In online nonstochastic control, both the cost functions as well as the perturbations from the assumed dynamical model are chosen by an adversary.
The target is to attain low regret against the best policy in hindsight from a benchmark class of policies.
arXiv Detail & Related papers (2022-11-17T16:12:45Z) - Online Control of Unknown Time-Varying Dynamical Systems [48.75672260851758]
We study online control of time-varying linear systems with unknown dynamics in the nonstochastic control model.
We study regret bounds with respect to common classes of policies: Disturbance Action (SLS), Disturbance Response (Youla), and linear feedback policies.
arXiv Detail & Related papers (2022-02-16T06:57:14Z) - OptiDICE: Offline Policy Optimization via Stationary Distribution
Correction Estimation [59.469401906712555]
We present an offline reinforcement learning algorithm that prevents overestimation in a more principled way.
Our algorithm, OptiDICE, directly estimates the stationary distribution corrections of the optimal policy.
We show that OptiDICE performs competitively with the state-of-the-art methods.
arXiv Detail & Related papers (2021-06-21T00:43:30Z) - Online Algorithms and Policies Using Adaptive and Machine Learning
Approaches [0.22020053359163297]
Two classes of nonlinear dynamic systems are considered, both of which are control-affine.
We propose a combination of a Reinforcement Learning based policy in the outer loop suitably chosen to ensure stability and optimality for the nominal dynamics.
In addition to establishing a stability guarantee with real-time control, the AC-RL controller is also shown to lead to parameter learning with persistent excitation.
arXiv Detail & Related papers (2021-05-13T22:51:25Z) - Non-stationary Online Learning with Memory and Non-stochastic Control [71.14503310914799]
We study the problem of Online Convex Optimization (OCO) with memory, which allows loss functions to depend on past decisions.
In this paper, we introduce dynamic policy regret as the performance measure to design algorithms robust to non-stationary environments.
We propose a novel algorithm for OCO with memory that provably enjoys an optimal dynamic policy regret in terms of time horizon, non-stationarity measure, and memory length.
arXiv Detail & Related papers (2021-02-07T09:45:15Z) - Competitive Policy Optimization [137.17299766844596]
We propose a novel policy gradient approach that exploits the game-theoretic nature of competitive games to derive policy updates.
Motivated by the competitive gradient optimization method, we derive a bilinear approximation of the game objective.
We empirically investigate their behavior on a set of comprehensive, yet challenging, competitive games.
arXiv Detail & Related papers (2020-06-18T15:31:09Z) - The Power of Linear Controllers in LQR Control [39.76359052907755]
We compute the policy regret between three distinct control policies.
We show that cost of the optimal offline linear policy converges to the cost of the optimal online policy.
Although we focus on the setting where the noise is, our results imply new lower bounds on the policy regret achievable when the noise is chosen by an adaptive adversary.
arXiv Detail & Related papers (2020-02-07T00:58:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.