On the Benefits of Inducing Local Lipschitzness for Robust Generative
Adversarial Imitation Learning
- URL: http://arxiv.org/abs/2107.00116v3
- Date: Mon, 15 Jan 2024 20:05:14 GMT
- Title: On the Benefits of Inducing Local Lipschitzness for Robust Generative
Adversarial Imitation Learning
- Authors: Farzan Memarian, Abolfazl Hashemi, Scott Niekum, Ufuk Topcu
- Abstract summary: We study the effect of local Lipschitzness of the discriminator and the generator on the robustness of policies learned by GAIL.
We show that the modified objective leads to learning significantly more robust policies.
- Score: 36.48610705372544
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We explore methodologies to improve the robustness of generative adversarial
imitation learning (GAIL) algorithms to observation noise. Towards this
objective, we study the effect of local Lipschitzness of the discriminator and
the generator on the robustness of policies learned by GAIL. In many robotics
applications, the learned policies by GAIL typically suffer from a degraded
performance at test time since the observations from the environment might be
corrupted by noise. Hence, robustifying the learned policies against the
observation noise is of critical importance. To this end, we propose a
regularization method to induce local Lipschitzness in the generator and the
discriminator of adversarial imitation learning methods. We show that the
modified objective leads to learning significantly more robust policies.
Moreover, we demonstrate -- both theoretically and experimentally -- that
training a locally Lipschitz discriminator leads to a locally Lipschitz
generator, thereby improving the robustness of the resultant policy. We perform
extensive experiments on simulated robot locomotion environments from the
MuJoCo suite that demonstrate the proposed method learns policies that
significantly outperform the state-of-the-art generative adversarial imitation
learning algorithm when applied to test scenarios with noise-corrupted
observations.
Related papers
- Robust Behavior Cloning Via Global Lipschitz Regularization [0.5767156832161817]
Behavior Cloning is an effective imitation learning technique and has even been adopted in some safety-critical domains such as autonomous vehicles.<n>We use a global Lipschitz regularization approach to enhance the robustness of the learned policy network.<n>We propose a way to construct a Lipschitz neural network that ensures the policy robustness.
arXiv Detail & Related papers (2025-06-24T02:19:08Z) - Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level
Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling.
We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z) - Certifiably Robust Reinforcement Learning through Model-Based Abstract
Interpretation [10.69970450827617]
We present a reinforcement learning framework in which the learned policy comes with a machine-checkable certificate of provable adversarial robustness.
We experimentally evaluate CAROL on four MuJoCo environments with continuous state and action spaces.
CAROL learns policies that, when contrasted with policies from the state-of-the-art robust RL algorithms, exhibit: (i) markedly enhanced certified performance lower bounds; and (ii) comparable performance under empirical adversarial attacks.
arXiv Detail & Related papers (2023-01-26T19:42:58Z) - Risk-Sensitive Reinforcement Learning with Exponential Criteria [0.0]
We provide a definition of robust reinforcement learning policies and formulate a risk-sensitive reinforcement learning problem to approximate them.
We introduce a novel online Actor-Critic algorithm based on solving a multiplicative Bellman equation using approximation updates.
The implementation, performance, and robustness properties of the proposed methods are evaluated in simulated experiments.
arXiv Detail & Related papers (2022-12-18T04:44:38Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - Robust Learning from Observation with Model Misspecification [33.92371002674386]
Imitation learning (IL) is a popular paradigm for training policies in robotic systems.
We propose a robust IL algorithm to learn policies that can effectively transfer to the real environment without fine-tuning.
arXiv Detail & Related papers (2022-02-12T07:04:06Z) - Policy Smoothing for Provably Robust Reinforcement Learning [109.90239627115336]
We study the provable robustness of reinforcement learning against norm-bounded adversarial perturbations of the inputs.
We generate certificates that guarantee that the total reward obtained by the smoothed policy will not fall below a certain threshold under a norm-bounded adversarial of perturbation the input.
arXiv Detail & Related papers (2021-06-21T21:42:08Z) - Robust Imitation Learning from Noisy Demonstrations [81.67837507534001]
We show that robust imitation learning can be achieved by optimizing a classification risk with a symmetric loss.
We propose a new imitation learning method that effectively combines pseudo-labeling with co-training.
Experimental results on continuous-control benchmarks show that our method is more robust compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-10-20T10:41:37Z) - Guided Uncertainty-Aware Policy Optimization: Combining Learning and
Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state.
reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle.
In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.