Learning Robust Feedback Policies from Demonstrations
- URL: http://arxiv.org/abs/2103.16629v1
- Date: Tue, 30 Mar 2021 19:11:05 GMT
- Title: Learning Robust Feedback Policies from Demonstrations
- Authors: Abed AlRahman Al Makdah and Vishaal Krishnan and Fabio Pasqualetti
- Abstract summary: We propose and analyze a new framework to learn feedback control policies that exhibit provable guarantees on the closed-loop performance and robustness to bounded (adversarial) perturbations.
These policies are learned from expert demonstrations without any prior knowledge of the task, its cost function, and system dynamics.
- Score: 9.34612743192798
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work we propose and analyze a new framework to learn feedback control
policies that exhibit provable guarantees on the closed-loop performance and
robustness to bounded (adversarial) perturbations. These policies are learned
from expert demonstrations without any prior knowledge of the task, its cost
function, and system dynamics. In contrast to the existing algorithms in
imitation learning and inverse reinforcement learning, we use a
Lipschitz-constrained loss minimization scheme to learn control policies with
certified robustness. We establish robust stability of the closed-loop system
under the learned control policy and derive an upper bound on its regret, which
bounds the sub-optimality of the closed-loop performance with respect to the
expert policy. We also derive a robustness bound for the deterioration of the
closed-loop performance under bounded (adversarial) perturbations on the state
measurements. Ultimately, our results suggest the existence of an underlying
tradeoff between nominal closed-loop performance and adversarial robustness,
and that improvements in nominal closed-loop performance can only be made at
the expense of robustness to adversarial perturbations. Numerical results
validate our analysis and demonstrate the effectiveness of our robust feedback
policy learning framework.
Related papers
- Balancing policy constraint and ensemble size in uncertainty-based
offline reinforcement learning [7.462336024223669]
We study the role of policy constraints as a mechanism for regulating uncertainty.
By incorporating behavioural cloning into policy updates, we show that sufficient penalisation can be achieved with a much smaller ensemble size.
We show how such an approach can facilitate stable online fine tuning, allowing for continued policy improvement while avoiding severe performance drops.
arXiv Detail & Related papers (2023-03-26T13:03:11Z) - Hallucinated Adversarial Control for Conservative Offline Policy
Evaluation [64.94009515033984]
We study the problem of conservative off-policy evaluation (COPE) where given an offline dataset of environment interactions, we seek to obtain a (tight) lower bound on a policy's performance.
We introduce HAMBO, which builds on an uncertainty-aware learned model of the transition dynamics.
We prove that the resulting COPE estimates are valid lower bounds, and, under regularity conditions, show their convergence to the true expected return.
arXiv Detail & Related papers (2023-03-02T08:57:35Z) - Bounded Robustness in Reinforcement Learning via Lexicographic
Objectives [54.00072722686121]
Policy robustness in Reinforcement Learning may not be desirable at any cost.
We study how policies can be maximally robust to arbitrary observational noise.
We propose a robustness-inducing scheme, applicable to any policy algorithm, that trades off expected policy utility for robustness.
arXiv Detail & Related papers (2022-09-30T08:53:18Z) - Penalized Proximal Policy Optimization for Safe Reinforcement Learning [68.86485583981866]
We propose Penalized Proximal Policy Optimization (P3O), which solves the cumbersome constrained policy iteration via a single minimization of an equivalent unconstrained problem.
P3O utilizes a simple-yet-effective penalty function to eliminate cost constraints and removes the trust-region constraint by the clipped surrogate objective.
We show that P3O outperforms state-of-the-art algorithms with respect to both reward improvement and constraint satisfaction on a set of constrained locomotive tasks.
arXiv Detail & Related papers (2022-05-24T06:15:51Z) - Bellman Residual Orthogonalization for Offline Reinforcement Learning [53.17258888552998]
We introduce a new reinforcement learning principle that approximates the Bellman equations by enforcing their validity only along a test function space.
We exploit this principle to derive confidence intervals for off-policy evaluation, as well as to optimize over policies within a prescribed policy class.
arXiv Detail & Related papers (2022-03-24T01:04:17Z) - Off-policy Reinforcement Learning with Optimistic Exploration and
Distribution Correction [73.77593805292194]
We train a separate exploration policy to maximize an approximate upper confidence bound of the critics in an off-policy actor-critic framework.
To mitigate the off-policy-ness, we adapt the recently introduced DICE framework to learn a distribution correction ratio for off-policy actor-critic training.
arXiv Detail & Related papers (2021-10-22T22:07:51Z) - Reinforcement Learning Policies in Continuous-Time Linear Systems [0.0]
We present online policies that learn optimal actions fast by carefully randomizing the parameter estimates.
We prove sharp stability results for inexact system dynamics and tightly specify the infinitesimal regret caused by sub-optimal actions.
Our analysis sheds light on fundamental challenges in continuous-time reinforcement learning and suggests a useful cornerstone for similar problems.
arXiv Detail & Related papers (2021-09-16T00:08:50Z) - On Imitation Learning of Linear Control Policies: Enforcing Stability
and Robustness Constraints via LMI Conditions [3.296303220677533]
We formulate the imitation learning of linear policies as a constrained optimization problem.
We show that one can guarantee the closed-loop stability and robustness by posing linear matrix inequality (LMI) constraints on the fitted policy.
arXiv Detail & Related papers (2021-03-24T02:43:03Z) - Closing the Closed-Loop Distribution Shift in Safe Imitation Learning [80.05727171757454]
We treat safe optimization-based control strategies as experts in an imitation learning problem.
We train a learned policy that can be cheaply evaluated at run-time and that provably satisfies the same safety guarantees as the expert.
arXiv Detail & Related papers (2021-02-18T05:11:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.