Diffusion-Reward Adversarial Imitation Learning
- URL: http://arxiv.org/abs/2405.16194v1
- Date: Sat, 25 May 2024 11:53:23 GMT
- Title: Diffusion-Reward Adversarial Imitation Learning
- Authors: Chun-Mao Lai, Hsiang-Chun Wang, Ping-Chun Hsieh, Yu-Chiang Frank Wang, Min-Hung Chen, Shao-Hua Sun,
- Abstract summary: Imitation learning aims to learn a policy from observing expert demonstrations without access to reward signals from environments.
Generative adversarial imitation learning (GAIL) formulates imitation learning as adversarial learning.
Inspired by the recent dominance of diffusion models in generative modeling, this work proposes Diffusion-Reward Adrial Imitation Learning (DRAIL)
- Score: 33.81857550294019
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Imitation learning aims to learn a policy from observing expert demonstrations without access to reward signals from environments. Generative adversarial imitation learning (GAIL) formulates imitation learning as adversarial learning, employing a generator policy learning to imitate expert behaviors and discriminator learning to distinguish the expert demonstrations from agent trajectories. Despite its encouraging results, GAIL training is often brittle and unstable. Inspired by the recent dominance of diffusion models in generative modeling, this work proposes Diffusion-Reward Adversarial Imitation Learning (DRAIL), which integrates a diffusion model into GAIL, aiming to yield more precise and smoother rewards for policy learning. Specifically, we propose a diffusion discriminative classifier to construct an enhanced discriminator; then, we design diffusion rewards based on the classifier's output for policy learning. We conduct extensive experiments in navigation, manipulation, and locomotion, verifying DRAIL's effectiveness compared to prior imitation learning methods. Moreover, additional experimental results demonstrate the generalizability and data efficiency of DRAIL. Visualized learned reward functions of GAIL and DRAIL suggest that DRAIL can produce more precise and smoother rewards.
Related papers
- RILe: Reinforced Imitation Learning [60.63173816209543]
Adversarial variants of Imitation Learning and Inverse Reinforcement Learning offer an alternative by learning policies from expert demonstrations via a discriminator.
We propose RILe, a teacher-student system that achieves both robustness to imperfect data and efficiency.
arXiv Detail & Related papers (2024-06-12T17:56:31Z) - DiffAIL: Diffusion Adversarial Imitation Learning [32.90853955228524]
Imitation learning aims to solve the problem of defining reward functions in real-world decision-making tasks.
We propose a method named diffusion adversarial imitation learning (DiffAIL)
Our method achieves state-of-the-art performance and significantly surpasses expert demonstration on two benchmark tasks.
arXiv Detail & Related papers (2023-12-11T12:53:30Z) - Sample-efficient Adversarial Imitation Learning [45.400080101596956]
We propose a self-supervised representation-based adversarial imitation learning method to learn state and action representations.
We show a 39% relative improvement over existing adversarial imitation learning methods on MuJoCo in a setting limited to 100 expert state-action pairs.
arXiv Detail & Related papers (2023-03-14T12:36:01Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - SURF: Semi-supervised Reward Learning with Data Augmentation for
Feedback-efficient Preference-based Reinforcement Learning [168.89470249446023]
We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation.
In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor.
Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
arXiv Detail & Related papers (2022-03-18T16:50:38Z) - Towards Equal Opportunity Fairness through Adversarial Learning [64.45845091719002]
Adversarial training is a common approach for bias mitigation in natural language processing.
We propose an augmented discriminator for adversarial training, which takes the target class as input to create richer features.
arXiv Detail & Related papers (2022-03-12T02:22:58Z) - $f$-GAIL: Learning $f$-Divergence for Generative Adversarial Imitation
Learning [29.459037918810143]
Imitation learning aims to learn a policy from expert demonstrations that minimizes the discrepancy between the learner and expert behaviors.
Given a set of expert demonstrations, which divergence can recover the expert policy more accurately with higher data efficiency?
We propose $f$-GAIL, a new generative adversarial imitation learning (GAIL) model, that automatically learns a discrepancy measure and a policy capable of producing expert-like behaviors.
arXiv Detail & Related papers (2020-10-02T21:39:56Z) - Off-Policy Adversarial Inverse Reinforcement Learning [0.0]
Adversarial Imitation Learning (AIL) is a class of algorithms in Reinforcement learning (RL)
We propose an Off-Policy Adversarial Inverse Reinforcement Learning (Off-policy-AIRL) algorithm which is sample efficient as well as gives good imitation performance.
arXiv Detail & Related papers (2020-05-03T16:51:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.