DiffAIL: Diffusion Adversarial Imitation Learning
- URL: http://arxiv.org/abs/2312.06348v2
- Date: Tue, 12 Dec 2023 03:47:38 GMT
- Title: DiffAIL: Diffusion Adversarial Imitation Learning
- Authors: Bingzheng Wang, Guoqiang Wu, Teng Pang, Yan Zhang, Yilong Yin
- Abstract summary: Imitation learning aims to solve the problem of defining reward functions in real-world decision-making tasks.
We propose a method named diffusion adversarial imitation learning (DiffAIL)
Our method achieves state-of-the-art performance and significantly surpasses expert demonstration on two benchmark tasks.
- Score: 32.90853955228524
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Imitation learning aims to solve the problem of defining reward functions in
real-world decision-making tasks. The current popular approach is the
Adversarial Imitation Learning (AIL) framework, which matches expert
state-action occupancy measures to obtain a surrogate reward for forward
reinforcement learning. However, the traditional discriminator is a simple
binary classifier and doesn't learn an accurate distribution, which may result
in failing to identify expert-level state-action pairs induced by the policy
interacting with the environment. To address this issue, we propose a method
named diffusion adversarial imitation learning (DiffAIL), which introduces the
diffusion model into the AIL framework. Specifically, DiffAIL models the
state-action pairs as unconditional diffusion models and uses diffusion loss as
part of the discriminator's learning objective, which enables the discriminator
to capture better expert demonstrations and improve generalization.
Experimentally, the results show that our method achieves state-of-the-art
performance and significantly surpasses expert demonstration on two benchmark
tasks, including the standard state-action setting and state-only settings. Our
code can be available at the link https://github.com/ML-Group-SDU/DiffAIL.
Related papers
- Diffusing States and Matching Scores: A New Framework for Imitation Learning [16.941612670582522]
Adversarial Imitation Learning is traditionally framed as a two-player zero-sum game between a learner and an adversarially chosen cost function.
In recent years, diffusion models have emerged as a non-adversarial alternative to GANs.
We show our approach outperforms GAN-style imitation learning baselines across various continuous control problems.
arXiv Detail & Related papers (2024-10-17T17:59:25Z) - Diffusion Imitation from Observation [4.205946699819021]
adversarial imitation learning approaches learn a generator agent policy to produce state transitions that are indistinguishable to a discriminator.
Motivated by the recent success of diffusion models in generative modeling, we propose to integrate a diffusion model into the adversarial imitation learning from observation framework.
arXiv Detail & Related papers (2024-10-07T18:49:55Z) - Diffusion-Reward Adversarial Imitation Learning [33.81857550294019]
Imitation learning aims to learn a policy from observing expert demonstrations without access to reward signals from environments.
Generative adversarial imitation learning (GAIL) formulates imitation learning as adversarial learning.
We propose Diffusion-Reward Adversarial Imitation Learning (DRAIL), which integrates a diffusion model into GAIL.
arXiv Detail & Related papers (2024-05-25T11:53:23Z) - Model Will Tell: Training Membership Inference for Diffusion Models [15.16244745642374]
Training Membership Inference (TMI) task aims to determine whether a specific sample has been used in the training process of a target model.
In this paper, we explore a novel perspective for the TMI task by leveraging the intrinsic generative priors within the diffusion model.
arXiv Detail & Related papers (2024-03-13T12:52:37Z) - Expert Proximity as Surrogate Rewards for Single Demonstration Imitation Learning [51.972577689963714]
Single-demonstration imitation learning (IL) is a practical approach for real-world applications where acquiring multiple expert demonstrations is costly or infeasible.
In contrast to typical IL settings, single-demonstration IL involves an agent having access to only one expert trajectory.
We highlight the issue of sparse reward signals in this setting and propose to mitigate this issue through our proposed Transition Discriminator-based IL (TDIL) method.
arXiv Detail & Related papers (2024-02-01T23:06:19Z) - Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining.
We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z) - DDTSE: Discriminative Diffusion Model for Target Speech Extraction [62.422291953387955]
We introduce the Discriminative Diffusion model for Target Speech Extraction (DDTSE)
We apply the same forward process as diffusion models and utilize the reconstruction loss similar to discriminative methods.
We devise a two-stage training strategy to emulate the inference process during model training.
arXiv Detail & Related papers (2023-09-25T04:58:38Z) - Diffusion Policies as an Expressive Policy Class for Offline
Reinforcement Learning [70.20191211010847]
Offline reinforcement learning (RL) aims to learn an optimal policy using a previously collected static dataset.
We introduce Diffusion Q-learning (Diffusion-QL) that utilizes a conditional diffusion model to represent the policy.
We show that our method can achieve state-of-the-art performance on the majority of the D4RL benchmark tasks.
arXiv Detail & Related papers (2022-08-12T09:54:11Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - Towards Equal Opportunity Fairness through Adversarial Learning [64.45845091719002]
Adversarial training is a common approach for bias mitigation in natural language processing.
We propose an augmented discriminator for adversarial training, which takes the target class as input to create richer features.
arXiv Detail & Related papers (2022-03-12T02:22:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.