The Power of Linear Controllers in LQR Control
- URL: http://arxiv.org/abs/2002.02574v1
- Date: Fri, 7 Feb 2020 00:58:49 GMT
- Title: The Power of Linear Controllers in LQR Control
- Authors: Gautam Goel, Babak Hassibi
- Abstract summary: We compute the policy regret between three distinct control policies.
We show that cost of the optimal offline linear policy converges to the cost of the optimal online policy.
Although we focus on the setting where the noise is, our results imply new lower bounds on the policy regret achievable when the noise is chosen by an adaptive adversary.
- Score: 39.76359052907755
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Linear Quadratic Regulator (LQR) framework considers the problem of
regulating a linear dynamical system perturbed by environmental noise. We
compute the policy regret between three distinct control policies: i) the
optimal online policy, whose linear structure is given by the Ricatti
equations; ii) the optimal offline linear policy, which is the best linear
state feedback policy given the noise sequence; and iii) the optimal offline
policy, which selects the globally optimal control actions given the noise
sequence. We fully characterize the optimal offline policy and show that it has
a recursive form in terms of the optimal online policy and future disturbances.
We also show that cost of the optimal offline linear policy converges to the
cost of the optimal online policy as the time horizon grows large, and
consequently the optimal offline linear policy incurs linear regret relative to
the optimal offline policy, even in the optimistic setting where the noise is
drawn i.i.d from a known distribution. Although we focus on the setting where
the noise is stochastic, our results also imply new lower bounds on the policy
regret achievable when the noise is chosen by an adaptive adversary.
Related papers
- Constraint-Generation Policy Optimization (CGPO): Nonlinear Programming
for Policy Optimization in Mixed Discrete-Continuous MDPs [23.87856533426793]
CGPO provides bounded policy error guarantees over an infinite range of initial states for many DC-MDPs with expressive nonlinear dynamics.
CGPO can generate worst-case state trajectories to diagnose policy deficiencies and provide counterfactual explanations of optimal actions.
We experimentally demonstrate the applicability of CGPO in diverse domains, including inventory control, management of a system of water reservoirs.
arXiv Detail & Related papers (2024-01-20T07:12:57Z) - Policy Gradient Converges to the Globally Optimal Policy for Nearly Linear-Quadratic Regulators [11.400431211239958]
We study the optimal rate in nearly linear-quadratic regulator systems.
We propose a policy that is guaranteed to the globally optimal rate with a gradient algorithm.
arXiv Detail & Related papers (2023-03-15T08:08:02Z) - Best of Both Worlds in Online Control: Competitive Ratio and Policy
Regret [61.59646565655169]
We show that several recently proposed online control algorithms achieve the best of both worlds: sublinear regret vs. the best DAC policy selected in hindsight.
We conclude that sublinear regret vs. the optimal competitive policy is attainable when the linear dynamical system is unknown.
arXiv Detail & Related papers (2022-11-21T07:29:08Z) - Introduction to Online Nonstochastic Control [34.77535508151501]
In online nonstochastic control, both the cost functions as well as the perturbations from the assumed dynamical model are chosen by an adversary.
The target is to attain low regret against the best policy in hindsight from a benchmark class of policies.
arXiv Detail & Related papers (2022-11-17T16:12:45Z) - Offline RL Policies Should be Trained to be Adaptive [89.8580376798065]
We show that acting optimally in offline RL in a Bayesian sense involves solving an implicit POMDP.
As a result, optimal policies for offline RL must be adaptive, depending not just on the current state but rather all the transitions seen so far during evaluation.
We present a model-free algorithm for approximating this optimal adaptive policy, and demonstrate the efficacy of learning such adaptive policies in offline RL benchmarks.
arXiv Detail & Related papers (2022-07-05T17:58:33Z) - COptiDICE: Offline Constrained Reinforcement Learning via Stationary
Distribution Correction Estimation [73.17078343706909]
offline constrained reinforcement learning (RL) problem, in which the agent aims to compute a policy that maximizes expected return while satisfying given cost constraints, learning only from a pre-collected dataset.
We present an offline constrained RL algorithm that optimize the policy in the space of the stationary distribution.
Our algorithm, COptiDICE, directly estimates the stationary distribution corrections of the optimal policy with respect to returns, while constraining the cost upper bound, with the goal of yielding a cost-conservative policy for actual constraint satisfaction.
arXiv Detail & Related papers (2022-04-19T15:55:47Z) - OptiDICE: Offline Policy Optimization via Stationary Distribution
Correction Estimation [59.469401906712555]
We present an offline reinforcement learning algorithm that prevents overestimation in a more principled way.
Our algorithm, OptiDICE, directly estimates the stationary distribution corrections of the optimal policy.
We show that OptiDICE performs competitively with the state-of-the-art methods.
arXiv Detail & Related papers (2021-06-21T00:43:30Z) - Iterative Amortized Policy Optimization [147.63129234446197]
Policy networks are a central feature of deep reinforcement learning (RL) algorithms for continuous control.
From the variational inference perspective, policy networks are a form of textitamortized optimization, optimizing network parameters rather than the policy distributions directly.
We demonstrate that iterative amortized policy optimization, yields performance improvements over direct amortization on benchmark continuous control tasks.
arXiv Detail & Related papers (2020-10-20T23:25:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.