Follow the Clairvoyant: an Imitation Learning Approach to Optimal
Control
- URL: http://arxiv.org/abs/2211.07389v1
- Date: Mon, 14 Nov 2022 14:15:12 GMT
- Title: Follow the Clairvoyant: an Imitation Learning Approach to Optimal
Control
- Authors: Andrea Martin, Luca Furieri, Florian D\"orfler, John Lygeros,
Giancarlo Ferrari-Trecate
- Abstract summary: We consider control of dynamical systems through the lens of competitive analysis.
Motivated by the observation that the optimal cost only provides coarse information about the ideal closed-loop behavior, we propose directly minimizing the tracking error.
- Score: 4.978565634673048
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider control of dynamical systems through the lens of competitive
analysis. Most prior work in this area focuses on minimizing regret, that is,
the loss relative to an ideal clairvoyant policy that has noncausal access to
past, present, and future disturbances. Motivated by the observation that the
optimal cost only provides coarse information about the ideal closed-loop
behavior, we instead propose directly minimizing the tracking error relative to
the optimal trajectories in hindsight, i.e., imitating the clairvoyant policy.
By embracing a system level perspective, we present an efficient
optimization-based approach for computing follow-the-clairvoyant (FTC) safe
controllers. We prove that these attain minimal regret if no constraints are
imposed on the noncausal benchmark. In addition, we present numerical
experiments to show that our policy retains the hallmark of competitive
algorithms of interpolating between classical $\mathcal{H}_2$ and
$\mathcal{H}_\infty$ control laws - while consistently outperforming regret
minimization methods in constrained scenarios thanks to the superior ability to
chase the clairvoyant.
Related papers
- Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.
To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.
Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z) - Rate-Optimal Online Convex Optimization in Adaptive Linear Control [0.0]
We consider the problem of controlling an unknown convex linear system under adversarially changing costs.
We present the first computationally-gret that attains an optimal linear hindsight function.
arXiv Detail & Related papers (2022-06-03T07:32:11Z) - Penalized Proximal Policy Optimization for Safe Reinforcement Learning [68.86485583981866]
We propose Penalized Proximal Policy Optimization (P3O), which solves the cumbersome constrained policy iteration via a single minimization of an equivalent unconstrained problem.
P3O utilizes a simple-yet-effective penalty function to eliminate cost constraints and removes the trust-region constraint by the clipped surrogate objective.
We show that P3O outperforms state-of-the-art algorithms with respect to both reward improvement and constraint satisfaction on a set of constrained locomotive tasks.
arXiv Detail & Related papers (2022-05-24T06:15:51Z) - Shortest-Path Constrained Reinforcement Learning for Sparse Reward Tasks [59.419152768018506]
We show that any optimal policy necessarily satisfies the k-SP constraint.
We propose a novel cost function that penalizes the policy violating SP constraint, instead of completely excluding it.
Our experiments on MiniGrid, DeepMind Lab, Atari, and Fetch show that the proposed method significantly improves proximal policy optimization (PPO)
arXiv Detail & Related papers (2021-07-13T21:39:21Z) - Regret-optimal Estimation and Control [52.28457815067461]
We show that the regret-optimal estimator and regret-optimal controller can be derived in state-space form.
We propose regret-optimal analogs of Model-Predictive Control (MPC) and the Extended KalmanFilter (EKF) for systems with nonlinear dynamics.
arXiv Detail & Related papers (2021-06-22T23:14:21Z) - A Generalised Inverse Reinforcement Learning Framework [24.316047317028147]
inverse Reinforcement Learning (IRL) is to estimate the unknown cost function of some MDP base on observed trajectories.
We introduce an alternative training loss that puts more weights on future states which yields a reformulation of the (maximum entropy) IRL problem.
The algorithms we devised exhibit enhanced performances (and similar tractability) than off-the-shelf ones in multiple OpenAI gym environments.
arXiv Detail & Related papers (2021-05-25T10:30:45Z) - Improper Learning with Gradient-based Policy Optimization [62.50997487685586]
We consider an improper reinforcement learning setting where the learner is given M base controllers for an unknown Markov Decision Process.
We propose a gradient-based approach that operates over a class of improper mixtures of the controllers.
arXiv Detail & Related papers (2021-02-16T14:53:55Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.