Related papers: On Imitation Learning of Linear Control Policies: Enforcing Stability and Robustness Constraints via LMI Conditions

On Imitation Learning of Linear Control Policies: Enforcing Stability and Robustness Constraints via LMI Conditions

URL: http://arxiv.org/abs/2103.12945v1
Date: Wed, 24 Mar 2021 02:43:03 GMT
Title: On Imitation Learning of Linear Control Policies: Enforcing Stability and Robustness Constraints via LMI Conditions
Authors: Aaron Havens and Bin Hu
Abstract summary: We formulate the imitation learning of linear policies as a constrained optimization problem. We show that one can guarantee the closed-loop stability and robustness by posing linear matrix inequality (LMI) constraints on the fitted policy.
Score: 3.296303220677533
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: When applying imitation learning techniques to fit a policy from expert demonstrations, one can take advantage of prior stability/robustness assumptions on the expert's policy and incorporate such control-theoretic prior knowledge explicitly into the learning process. In this paper, we formulate the imitation learning of linear policies as a constrained optimization problem, and present efficient methods which can be used to enforce stability and robustness constraints during the learning processes. Specifically, we show that one can guarantee the closed-loop stability and robustness by posing linear matrix inequality (LMI) constraints on the fitted policy. Then both the projected gradient descent method and the alternating direction method of multipliers (ADMM) method can be applied to solve the resulting constrained policy fitting problem. Finally, we provide numerical results to demonstrate the effectiveness of our methods in producing linear polices with various stability and robustness guarantees.

Related papers

Learning safe, constrained policies via imitation learning: Connection to Probabilistic Inference and a Naive Algorithm [0.22099217573031676]
This article introduces an imitation learning method for learning maximum entropy policies that comply with constraints demonstrated by expert executing a task.<n> Experiments show that the method can learn effective policy models for constraints-abiding behaviour, in settings with multiple constraints of different types, and with abilities to generalize.
arXiv Detail & Related papers (2025-07-09T12:11:27Z)
SelfBC: Self Behavior Cloning for Offline Reinforcement Learning [14.573290839055316]
We propose a novel dynamic policy constraint that restricts the learned policy on the samples generated by the exponential moving average of previously learned policies. Our approach results in a nearly monotonically improved reference policy.
arXiv Detail & Related papers (2024-08-04T23:23:48Z)
Synthesizing Stable Reduced-Order Visuomotor Policies for Nonlinear Systems via Sums-of-Squares Optimization [28.627377507894003]
We present a method for noise-feedback, reduced-order output-of-control-perception policies for control observations of nonlinear systems. We show that when these systems from images can fail to reliably stabilize, our approach can provide stability guarantees.
arXiv Detail & Related papers (2023-04-24T19:34:09Z)
Bounded Robustness in Reinforcement Learning via Lexicographic Objectives [54.00072722686121]
Policy robustness in Reinforcement Learning may not be desirable at any cost. We study how policies can be maximally robust to arbitrary observational noise. We propose a robustness-inducing scheme, applicable to any policy algorithm, that trades off expected policy utility for robustness.
arXiv Detail & Related papers (2022-09-30T08:53:18Z)
Bellman Residual Orthogonalization for Offline Reinforcement Learning [53.17258888552998]
We introduce a new reinforcement learning principle that approximates the Bellman equations by enforcing their validity only along a test function space. We exploit this principle to derive confidence intervals for off-policy evaluation, as well as to optimize over policies within a prescribed policy class.
arXiv Detail & Related papers (2022-03-24T01:04:17Z)
Supported Policy Optimization for Offline Reinforcement Learning [74.1011309005488]
Policy constraint methods to offline reinforcement learning (RL) typically utilize parameterization or regularization. Regularization methods reduce the divergence between the learned policy and the behavior policy. This paper presents Supported Policy OpTimization (SPOT), which is directly derived from the theoretical formalization of the density-based support constraint.
arXiv Detail & Related papers (2022-02-13T07:38:36Z)
Imitation Learning of Stabilizing Policies for Nonlinear Systems [1.52292571922932]
It is shown that the methods developed for linear systems and controllers can be readily extended to controllers using sum of squares. A projected gradient descent algorithm and an alternating direction method of algorithm are proposed ass for the stabilizing imitation learning problem.
arXiv Detail & Related papers (2021-09-22T17:27:19Z)
Closing the Closed-Loop Distribution Shift in Safe Imitation Learning [80.05727171757454]
We treat safe optimization-based control strategies as experts in an imitation learning problem. We train a learned policy that can be cheaply evaluated at run-time and that provably satisfies the same safety guarantees as the expert.
arXiv Detail & Related papers (2021-02-18T05:11:41Z)
Improper Learning with Gradient-based Policy Optimization [62.50997487685586]
We consider an improper reinforcement learning setting where the learner is given M base controllers for an unknown Markov Decision Process. We propose a gradient-based approach that operates over a class of improper mixtures of the controllers.
arXiv Detail & Related papers (2021-02-16T14:53:55Z)
Learning Constrained Adaptive Differentiable Predictive Control Policies With Guarantees [1.1086440815804224]
We present differentiable predictive control (DPC), a method for learning constrained neural control policies for linear systems. We employ automatic differentiation to obtain direct policy gradients by backpropagating the model predictive control (MPC) loss function and constraints penalties through a differentiable closed-loop system dynamics model.
arXiv Detail & Related papers (2020-04-23T14:24:44Z)
Stable Policy Optimization via Off-Policy Divergence Regularization [50.98542111236381]
Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) are among the most successful policy gradient approaches in deep reinforcement learning (RL) We propose a new algorithm which stabilizes the policy improvement through a proximity term that constrains the discounted state-action visitation distribution induced by consecutive policies to be close to one another. Our proposed method can have a beneficial effect on stability and improve final performance in benchmark high-dimensional control tasks.
arXiv Detail & Related papers (2020-03-09T13:05:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.