Related papers: Diffusion-Reward Adversarial Imitation Learning

Diffusion-Reward Adversarial Imitation Learning

URL: http://arxiv.org/abs/2405.16194v1
Date: Sat, 25 May 2024 11:53:23 GMT
Title: Diffusion-Reward Adversarial Imitation Learning
Authors: Chun-Mao Lai, Hsiang-Chun Wang, Ping-Chun Hsieh, Yu-Chiang Frank Wang, Min-Hung Chen, Shao-Hua Sun,
Abstract summary: Imitation learning aims to learn a policy from observing expert demonstrations without access to reward signals from environments. Generative adversarial imitation learning (GAIL) formulates imitation learning as adversarial learning. Inspired by the recent dominance of diffusion models in generative modeling, this work proposes Diffusion-Reward Adrial Imitation Learning (DRAIL)
Score: 33.81857550294019
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Imitation learning aims to learn a policy from observing expert demonstrations without access to reward signals from environments. Generative adversarial imitation learning (GAIL) formulates imitation learning as adversarial learning, employing a generator policy learning to imitate expert behaviors and discriminator learning to distinguish the expert demonstrations from agent trajectories. Despite its encouraging results, GAIL training is often brittle and unstable. Inspired by the recent dominance of diffusion models in generative modeling, this work proposes Diffusion-Reward Adversarial Imitation Learning (DRAIL), which integrates a diffusion model into GAIL, aiming to yield more precise and smoother rewards for policy learning. Specifically, we propose a diffusion discriminative classifier to construct an enhanced discriminator; then, we design diffusion rewards based on the classifier's output for policy learning. We conduct extensive experiments in navigation, manipulation, and locomotion, verifying DRAIL's effectiveness compared to prior imitation learning methods. Moreover, additional experimental results demonstrate the generalizability and data efficiency of DRAIL. Visualized learned reward functions of GAIL and DRAIL suggest that DRAIL can produce more precise and smoother rewards.

Related papers

Diffusion Imitation from Observation [4.205946699819021]
adversarial imitation learning approaches learn a generator agent policy to produce state transitions that are indistinguishable to a discriminator. Motivated by the recent success of diffusion models in generative modeling, we propose to integrate a diffusion model into the adversarial imitation learning from observation framework.
arXiv Detail & Related papers (2024-10-07T18:49:55Z)
RILe: Reinforced Imitation Learning [60.63173816209543]
RILe (Reinforced Learning) is a framework that combines the strengths of imitation learning and inverse reinforcement learning to learn a dense reward function efficiently. Our framework produces high-performing policies in high-dimensional tasks where direct imitation fails to replicate complex behaviors.
arXiv Detail & Related papers (2024-06-12T17:56:31Z)
Diffusion Reward: Learning Rewards via Conditional Video Diffusion [26.73119637442011]
Diffusion Reward is a framework that learns rewards from expert videos via conditional video diffusion models. We show the efficacy of our method over robotic manipulation tasks in both simulation platforms and the real world with visual input.
arXiv Detail & Related papers (2023-12-21T18:55:05Z)
DiffAIL: Diffusion Adversarial Imitation Learning [32.90853955228524]
Imitation learning aims to solve the problem of defining reward functions in real-world decision-making tasks. We propose a method named diffusion adversarial imitation learning (DiffAIL) Our method achieves state-of-the-art performance and significantly surpasses expert demonstration on two benchmark tasks.
arXiv Detail & Related papers (2023-12-11T12:53:30Z)
Generating Personas for Games with Multimodal Adversarial Imitation Learning [47.70823327747952]
Reinforcement learning has been widely successful in producing agents capable of playing games at a human level. Going beyond reinforcement learning is necessary to model a wide range of human playstyles. This paper presents a novel imitation learning approach to generate multiple persona policies for playtesting.
arXiv Detail & Related papers (2023-08-15T06:58:19Z)
Reinforcement Learning from Passive Data via Latent Intentions [86.4969514480008]
We show that passive data can still be used to learn features that accelerate downstream RL. Our approach learns from passive data by modeling intentions. Our experiments demonstrate the ability to learn from many forms of passive data, including cross-embodiment video data and YouTube videos.
arXiv Detail & Related papers (2023-04-10T17:59:05Z)
Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior. This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z)
Sample Efficient Imitation Learning via Reward Function Trained in Advance [2.66512000865131]
Imitation learning (IL) is a framework that learns to imitate expert behavior from demonstrations. In this article, we make an effort to improve sample efficiency by introducing a novel scheme of inverse reinforcement learning.
arXiv Detail & Related papers (2021-11-23T08:06:09Z)
PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD) We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z)
$f$-GAIL: Learning $f$-Divergence for Generative Adversarial Imitation Learning [29.459037918810143]
Imitation learning aims to learn a policy from expert demonstrations that minimizes the discrepancy between the learner and expert behaviors. Given a set of expert demonstrations, which divergence can recover the expert policy more accurately with higher data efficiency? We propose $f$-GAIL, a new generative adversarial imitation learning (GAIL) model, that automatically learns a discrepancy measure and a policy capable of producing expert-like behaviors.
arXiv Detail & Related papers (2020-10-02T21:39:56Z)
Bridging the Imitation Gap by Adaptive Insubordination [88.35564081175642]
We show that when the teaching agent makes decisions with access to privileged information, this information is marginalized during imitation learning. We propose 'Adaptive Insubordination' (ADVISOR) to address this gap. ADVISOR dynamically weights imitation and reward-based reinforcement learning losses during training, enabling on-the-fly switching between imitation and exploration.
arXiv Detail & Related papers (2020-07-23T17:59:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.