Diffusion-Reward Adversarial Imitation Learning
- URL: http://arxiv.org/abs/2405.16194v4
- Date: Tue, 26 Nov 2024 02:47:01 GMT
- Title: Diffusion-Reward Adversarial Imitation Learning
- Authors: Chun-Mao Lai, Hsiang-Chun Wang, Ping-Chun Hsieh, Yu-Chiang Frank Wang, Min-Hung Chen, Shao-Hua Sun,
- Abstract summary: Imitation learning aims to learn a policy from observing expert demonstrations without access to reward signals from environments.
Generative adversarial imitation learning (GAIL) formulates imitation learning as adversarial learning.
We propose Diffusion-Reward Adversarial Imitation Learning (DRAIL), which integrates a diffusion model into GAIL.
- Score: 33.81857550294019
- License:
- Abstract: Imitation learning aims to learn a policy from observing expert demonstrations without access to reward signals from environments. Generative adversarial imitation learning (GAIL) formulates imitation learning as adversarial learning, employing a generator policy learning to imitate expert behaviors and discriminator learning to distinguish the expert demonstrations from agent trajectories. Despite its encouraging results, GAIL training is often brittle and unstable. Inspired by the recent dominance of diffusion models in generative modeling, we propose Diffusion-Reward Adversarial Imitation Learning (DRAIL), which integrates a diffusion model into GAIL, aiming to yield more robust and smoother rewards for policy learning. Specifically, we propose a diffusion discriminative classifier to construct an enhanced discriminator, and design diffusion rewards based on the classifier's output for policy learning. Extensive experiments are conducted in navigation, manipulation, and locomotion, verifying DRAIL's effectiveness compared to prior imitation learning methods. Moreover, additional experimental results demonstrate the generalizability and data efficiency of DRAIL. Visualized learned reward functions of GAIL and DRAIL suggest that DRAIL can produce more robust and smoother rewards. Project page: https://nturobotlearninglab.github.io/DRAIL/
Related papers
- Diffusion Imitation from Observation [4.205946699819021]
adversarial imitation learning approaches learn a generator agent policy to produce state transitions that are indistinguishable to a discriminator.
Motivated by the recent success of diffusion models in generative modeling, we propose to integrate a diffusion model into the adversarial imitation learning from observation framework.
arXiv Detail & Related papers (2024-10-07T18:49:55Z) - Diffusion Reward: Learning Rewards via Conditional Video Diffusion [26.73119637442011]
Diffusion Reward is a framework that learns rewards from expert videos via conditional video diffusion models.
We show the efficacy of our method over robotic manipulation tasks in both simulation platforms and the real world with visual input.
arXiv Detail & Related papers (2023-12-21T18:55:05Z) - DiffAIL: Diffusion Adversarial Imitation Learning [32.90853955228524]
Imitation learning aims to solve the problem of defining reward functions in real-world decision-making tasks.
We propose a method named diffusion adversarial imitation learning (DiffAIL)
Our method achieves state-of-the-art performance and significantly surpasses expert demonstration on two benchmark tasks.
arXiv Detail & Related papers (2023-12-11T12:53:30Z) - Generating Personas for Games with Multimodal Adversarial Imitation
Learning [47.70823327747952]
Reinforcement learning has been widely successful in producing agents capable of playing games at a human level.
Going beyond reinforcement learning is necessary to model a wide range of human playstyles.
This paper presents a novel imitation learning approach to generate multiple persona policies for playtesting.
arXiv Detail & Related papers (2023-08-15T06:58:19Z) - Reinforcement Learning from Passive Data via Latent Intentions [86.4969514480008]
We show that passive data can still be used to learn features that accelerate downstream RL.
Our approach learns from passive data by modeling intentions.
Our experiments demonstrate the ability to learn from many forms of passive data, including cross-embodiment video data and YouTube videos.
arXiv Detail & Related papers (2023-04-10T17:59:05Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Sample Efficient Imitation Learning via Reward Function Trained in
Advance [2.66512000865131]
Imitation learning (IL) is a framework that learns to imitate expert behavior from demonstrations.
In this article, we make an effort to improve sample efficiency by introducing a novel scheme of inverse reinforcement learning.
arXiv Detail & Related papers (2021-11-23T08:06:09Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - $f$-GAIL: Learning $f$-Divergence for Generative Adversarial Imitation
Learning [29.459037918810143]
Imitation learning aims to learn a policy from expert demonstrations that minimizes the discrepancy between the learner and expert behaviors.
Given a set of expert demonstrations, which divergence can recover the expert policy more accurately with higher data efficiency?
We propose $f$-GAIL, a new generative adversarial imitation learning (GAIL) model, that automatically learns a discrepancy measure and a policy capable of producing expert-like behaviors.
arXiv Detail & Related papers (2020-10-02T21:39:56Z) - Bridging the Imitation Gap by Adaptive Insubordination [88.35564081175642]
We show that when the teaching agent makes decisions with access to privileged information, this information is marginalized during imitation learning.
We propose 'Adaptive Insubordination' (ADVISOR) to address this gap.
ADVISOR dynamically weights imitation and reward-based reinforcement learning losses during training, enabling on-the-fly switching between imitation and exploration.
arXiv Detail & Related papers (2020-07-23T17:59:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.