Related papers: DiffAIL: Diffusion Adversarial Imitation Learning

DiffAIL: Diffusion Adversarial Imitation Learning

URL: http://arxiv.org/abs/2312.06348v2
Date: Tue, 12 Dec 2023 03:47:38 GMT
Title: DiffAIL: Diffusion Adversarial Imitation Learning
Authors: Bingzheng Wang, Guoqiang Wu, Teng Pang, Yan Zhang, Yilong Yin
Abstract summary: Imitation learning aims to solve the problem of defining reward functions in real-world decision-making tasks. We propose a method named diffusion adversarial imitation learning (DiffAIL) Our method achieves state-of-the-art performance and significantly surpasses expert demonstration on two benchmark tasks.
Score: 32.90853955228524
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Imitation learning aims to solve the problem of defining reward functions in real-world decision-making tasks. The current popular approach is the Adversarial Imitation Learning (AIL) framework, which matches expert state-action occupancy measures to obtain a surrogate reward for forward reinforcement learning. However, the traditional discriminator is a simple binary classifier and doesn't learn an accurate distribution, which may result in failing to identify expert-level state-action pairs induced by the policy interacting with the environment. To address this issue, we propose a method named diffusion adversarial imitation learning (DiffAIL), which introduces the diffusion model into the AIL framework. Specifically, DiffAIL models the state-action pairs as unconditional diffusion models and uses diffusion loss as part of the discriminator's learning objective, which enables the discriminator to capture better expert demonstrations and improve generalization. Experimentally, the results show that our method achieves state-of-the-art performance and significantly surpasses expert demonstration on two benchmark tasks, including the standard state-action setting and state-only settings. Our code can be available at the link https://github.com/ML-Group-SDU/DiffAIL.

Related papers

Improving Discriminator Guidance in Diffusion Models [43.91753296748528]
We show that training the discriminator using Cross-Entropy loss, as commonly done, can increase the Kullback-Leibler divergence between the model and target distributions. We propose a theoretically sound training objective for discriminator guidance that properly minimizes the KL divergence.
arXiv Detail & Related papers (2025-03-20T13:04:43Z)
Diffusing States and Matching Scores: A New Framework for Imitation Learning [16.941612670582522]
Adversarial Imitation Learning is traditionally framed as a two-player zero-sum game between a learner and an adversarially chosen cost function. In recent years, diffusion models have emerged as a non-adversarial alternative to GANs. We show our approach outperforms GAN-style imitation learning baselines across various continuous control problems.
arXiv Detail & Related papers (2024-10-17T17:59:25Z)
Diffusion Imitation from Observation [4.205946699819021]
adversarial imitation learning approaches learn a generator agent policy to produce state transitions that are indistinguishable to a discriminator. Motivated by the recent success of diffusion models in generative modeling, we propose to integrate a diffusion model into the adversarial imitation learning from observation framework.
arXiv Detail & Related papers (2024-10-07T18:49:55Z)
Diffusion-Reward Adversarial Imitation Learning [33.81857550294019]
Imitation learning aims to learn a policy from observing expert demonstrations without access to reward signals from environments. Generative adversarial imitation learning (GAIL) formulates imitation learning as adversarial learning. We propose Diffusion-Reward Adversarial Imitation Learning (DRAIL), which integrates a diffusion model into GAIL.
arXiv Detail & Related papers (2024-05-25T11:53:23Z)
Model Will Tell: Training Membership Inference for Diffusion Models [15.16244745642374]
Training Membership Inference (TMI) task aims to determine whether a specific sample has been used in the training process of a target model. In this paper, we explore a novel perspective for the TMI task by leveraging the intrinsic generative priors within the diffusion model.
arXiv Detail & Related papers (2024-03-13T12:52:37Z)
Expert Proximity as Surrogate Rewards for Single Demonstration Imitation Learning [51.972577689963714]
Single-demonstration imitation learning (IL) is a practical approach for real-world applications where acquiring multiple expert demonstrations is costly or infeasible. In contrast to typical IL settings, single-demonstration IL involves an agent having access to only one expert trajectory. We highlight the issue of sparse reward signals in this setting and propose to mitigate this issue through our proposed Transition Discriminator-based IL (TDIL) method.
arXiv Detail & Related papers (2024-02-01T23:06:19Z)
Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining. We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z)
DDTSE: Discriminative Diffusion Model for Target Speech Extraction [62.422291953387955]
We introduce the Discriminative Diffusion model for Target Speech Extraction (DDTSE) We apply the same forward process as diffusion models and utilize the reconstruction loss similar to discriminative methods. We devise a two-stage training strategy to emulate the inference process during model training.
arXiv Detail & Related papers (2023-09-25T04:58:38Z)
Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning [70.20191211010847]
Offline reinforcement learning (RL) aims to learn an optimal policy using a previously collected static dataset. We introduce Diffusion Q-learning (Diffusion-QL) that utilizes a conditional diffusion model to represent the policy. We show that our method can achieve state-of-the-art performance on the majority of the D4RL benchmark tasks.
arXiv Detail & Related papers (2022-08-12T09:54:11Z)
Imitating, Fast and Slow: Robust learning from demonstrations via decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning. We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z)
Towards Equal Opportunity Fairness through Adversarial Learning [64.45845091719002]
Adversarial training is a common approach for bias mitigation in natural language processing. We propose an augmented discriminator for adversarial training, which takes the target class as input to create richer features.
arXiv Detail & Related papers (2022-03-12T02:22:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.