On Imitation Learning of Linear Control Policies: Enforcing Stability
and Robustness Constraints via LMI Conditions
- URL: http://arxiv.org/abs/2103.12945v1
- Date: Wed, 24 Mar 2021 02:43:03 GMT
- Title: On Imitation Learning of Linear Control Policies: Enforcing Stability
and Robustness Constraints via LMI Conditions
- Authors: Aaron Havens and Bin Hu
- Abstract summary: We formulate the imitation learning of linear policies as a constrained optimization problem.
We show that one can guarantee the closed-loop stability and robustness by posing linear matrix inequality (LMI) constraints on the fitted policy.
- Score: 3.296303220677533
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When applying imitation learning techniques to fit a policy from expert
demonstrations, one can take advantage of prior stability/robustness
assumptions on the expert's policy and incorporate such control-theoretic prior
knowledge explicitly into the learning process. In this paper, we formulate the
imitation learning of linear policies as a constrained optimization problem,
and present efficient methods which can be used to enforce stability and
robustness constraints during the learning processes. Specifically, we show
that one can guarantee the closed-loop stability and robustness by posing
linear matrix inequality (LMI) constraints on the fitted policy. Then both the
projected gradient descent method and the alternating direction method of
multipliers (ADMM) method can be applied to solve the resulting constrained
policy fitting problem. Finally, we provide numerical results to demonstrate
the effectiveness of our methods in producing linear polices with various
stability and robustness guarantees.
Related papers
- SelfBC: Self Behavior Cloning for Offline Reinforcement Learning [14.573290839055316]
We propose a novel dynamic policy constraint that restricts the learned policy on the samples generated by the exponential moving average of previously learned policies.
Our approach results in a nearly monotonically improved reference policy.
arXiv Detail & Related papers (2024-08-04T23:23:48Z) - Synthesizing Stable Reduced-Order Visuomotor Policies for Nonlinear
Systems via Sums-of-Squares Optimization [28.627377507894003]
We present a method for noise-feedback, reduced-order output-of-control-perception policies for control observations of nonlinear systems.
We show that when these systems from images can fail to reliably stabilize, our approach can provide stability guarantees.
arXiv Detail & Related papers (2023-04-24T19:34:09Z) - Bounded Robustness in Reinforcement Learning via Lexicographic
Objectives [54.00072722686121]
Policy robustness in Reinforcement Learning may not be desirable at any cost.
We study how policies can be maximally robust to arbitrary observational noise.
We propose a robustness-inducing scheme, applicable to any policy algorithm, that trades off expected policy utility for robustness.
arXiv Detail & Related papers (2022-09-30T08:53:18Z) - Bellman Residual Orthogonalization for Offline Reinforcement Learning [53.17258888552998]
We introduce a new reinforcement learning principle that approximates the Bellman equations by enforcing their validity only along a test function space.
We exploit this principle to derive confidence intervals for off-policy evaluation, as well as to optimize over policies within a prescribed policy class.
arXiv Detail & Related papers (2022-03-24T01:04:17Z) - Supported Policy Optimization for Offline Reinforcement Learning [74.1011309005488]
Policy constraint methods to offline reinforcement learning (RL) typically utilize parameterization or regularization.
Regularization methods reduce the divergence between the learned policy and the behavior policy.
This paper presents Supported Policy OpTimization (SPOT), which is directly derived from the theoretical formalization of the density-based support constraint.
arXiv Detail & Related papers (2022-02-13T07:38:36Z) - Imitation Learning of Stabilizing Policies for Nonlinear Systems [1.52292571922932]
It is shown that the methods developed for linear systems and controllers can be readily extended to controllers using sum of squares.
A projected gradient descent algorithm and an alternating direction method of algorithm are proposed ass for the stabilizing imitation learning problem.
arXiv Detail & Related papers (2021-09-22T17:27:19Z) - Closing the Closed-Loop Distribution Shift in Safe Imitation Learning [80.05727171757454]
We treat safe optimization-based control strategies as experts in an imitation learning problem.
We train a learned policy that can be cheaply evaluated at run-time and that provably satisfies the same safety guarantees as the expert.
arXiv Detail & Related papers (2021-02-18T05:11:41Z) - Improper Learning with Gradient-based Policy Optimization [62.50997487685586]
We consider an improper reinforcement learning setting where the learner is given M base controllers for an unknown Markov Decision Process.
We propose a gradient-based approach that operates over a class of improper mixtures of the controllers.
arXiv Detail & Related papers (2021-02-16T14:53:55Z) - Learning Constrained Adaptive Differentiable Predictive Control Policies
With Guarantees [1.1086440815804224]
We present differentiable predictive control (DPC), a method for learning constrained neural control policies for linear systems.
We employ automatic differentiation to obtain direct policy gradients by backpropagating the model predictive control (MPC) loss function and constraints penalties through a differentiable closed-loop system dynamics model.
arXiv Detail & Related papers (2020-04-23T14:24:44Z) - Stable Policy Optimization via Off-Policy Divergence Regularization [50.98542111236381]
Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) are among the most successful policy gradient approaches in deep reinforcement learning (RL)
We propose a new algorithm which stabilizes the policy improvement through a proximity term that constrains the discounted state-action visitation distribution induced by consecutive policies to be close to one another.
Our proposed method can have a beneficial effect on stability and improve final performance in benchmark high-dimensional control tasks.
arXiv Detail & Related papers (2020-03-09T13:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.