SD2AIL: Adversarial Imitation Learning from Synthetic Demonstrations via Diffusion Models
- URL: http://arxiv.org/abs/2512.18583v1
- Date: Sun, 21 Dec 2025 04:00:38 GMT
- Title: SD2AIL: Adversarial Imitation Learning from Synthetic Demonstrations via Diffusion Models
- Authors: Pengcheng Li, Qiang Fang, Tong Zhao, Yixing Lan, Xin Xu,
- Abstract summary: Adversarial Imitation Learning (AIL) is a framework that infers rewards from expert demonstrations to guide policy optimization.<n>We propose SD2AIL, which utilizes synthetic demonstrations via diffusion models.<n>Our method achieves an average return of 3441, surpassing the state-of-the-art method by 89.
- Score: 24.224053689246748
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial Imitation Learning (AIL) is a dominant framework in imitation learning that infers rewards from expert demonstrations to guide policy optimization. Although providing more expert demonstrations typically leads to improved performance and greater stability, collecting such demonstrations can be challenging in certain scenarios. Inspired by the success of diffusion models in data generation, we propose SD2AIL, which utilizes synthetic demonstrations via diffusion models. We first employ a diffusion model in the discriminator to generate synthetic demonstrations as pseudo-expert data that augment the expert demonstrations. To selectively replay the most valuable demonstrations from the large pool of (pseudo-) expert demonstrations, we further introduce a prioritized expert demonstration replay strategy (PEDR). The experimental results on simulation tasks demonstrate the effectiveness and robustness of our method. In particular, in the Hopper task, our method achieves an average return of 3441, surpassing the state-of-the-art method by 89. Our code will be available at https://github.com/positron-lpc/SD2AIL.
Related papers
- DiffusionTrend: A Minimalist Approach to Virtual Fashion Try-On [103.89972383310715]
DiffusionTrend harnesses latent information rich in prior information to capture the nuances of garment details.<n>It delivers a visually compelling try-on experience, underscoring the potential of training-free diffusion model.
arXiv Detail & Related papers (2024-12-19T02:24:35Z) - Imitation Learning from Purified Demonstrations [47.52316615371601]
We propose to purify the potential noises in imperfect demonstrations first, and subsequently conduct imitation learning from these purified demonstrations.
We provide theoretical evidence supporting our approach, demonstrating that the distance between the purified and optimal demonstration can be bounded.
arXiv Detail & Related papers (2023-10-11T02:36:52Z) - Unlabeled Imperfect Demonstrations in Adversarial Imitation Learning [48.595574101874575]
In the real world, expert demonstrations are more likely to be imperfect.
A positive-unlabeled adversarial imitation learning algorithm is developed.
Agent policy will be optimized to cheat the discriminator and produce trajectories similar to those optimal expert demonstrations.
arXiv Detail & Related papers (2023-02-13T11:26:44Z) - Out-of-Dynamics Imitation Learning from Multimodal Demonstrations [68.46458026983409]
We study out-of-dynamics imitation learning (OOD-IL), which relaxes the assumption to that the demonstrator and the imitator have the same state spaces.
OOD-IL enables imitation learning to utilize demonstrations from a wide range of demonstrators but introduces a new challenge.
We develop a better transferability measurement to tackle this newly-emerged challenge.
arXiv Detail & Related papers (2022-11-13T07:45:06Z) - Robustness of Demonstration-based Learning Under Limited Data Scenario [54.912936555876826]
Demonstration-based learning has shown great potential in stimulating pretrained language models' ability under limited data scenario.
Why such demonstrations are beneficial for the learning process remains unclear since there is no explicit alignment between the demonstrations and the predictions.
In this paper, we design pathological demonstrations by gradually removing intuitively useful information from the standard ones to take a deep dive of the robustness of demonstration-based sequence labeling.
arXiv Detail & Related papers (2022-10-19T16:15:04Z) - Robust Imitation of a Few Demonstrations with a Backwards Model [3.8530020696501794]
Behavior cloning of expert demonstrations can speed up learning optimal policies in a more sample-efficient way than reinforcement learning.
We tackle this issue by extending the region of attraction around the demonstrations so that the agent can learn how to get back onto the demonstrated trajectories if it veers off-course.
With optimal or near-optimal demonstrations, the learned policy will be both optimal and robust to deviations, with a wider region of attraction.
arXiv Detail & Related papers (2022-10-17T18:02:19Z) - Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations
using Generative Models [18.195406135434503]
We propose a method that combines reinforcement and imitation learning by shaping the reward function with a state-and-action-dependent potential.
We show that this accelerates policy learning by specifying high-value areas of the state and action space that are worth exploring first.
In particular, we examine both normalizing flows and Generative Adversarial Networks to represent these potentials.
arXiv Detail & Related papers (2020-11-02T20:32:05Z) - Intrinsic Reward Driven Imitation Learning via Generative Model [48.97800481338626]
Most inverse reinforcement learning (IRL) methods fail to outperform the demonstrator in a high-dimensional environment.
We propose a novel reward learning module to generate intrinsic reward signals via a generative model.
Empirical results show that our method outperforms state-of-the-art IRL methods on multiple Atari games, even with one-life demonstration.
arXiv Detail & Related papers (2020-06-26T15:39:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.