Learning What to Do and What Not To Do: Offline Imitation from Expert and Undesirable Demonstrations
- URL: http://arxiv.org/abs/2505.21182v1
- Date: Tue, 27 May 2025 13:33:21 GMT
- Title: Learning What to Do and What Not To Do: Offline Imitation from Expert and Undesirable Demonstrations
- Authors: Huy Hoang, Tien Mai, Pradeep Varakantham, Tanvi Verma,
- Abstract summary: We study offline imitation learning from contrasting behaviors, where the dataset contains both expert and undesirable demonstrations.<n>We propose a novel formulation that optimize a difference of KL divergences over the state-action visitation distributions of expert and undesirable (or bad) data.<n>Our method avoids adversarial training and handles both positive and negative demonstrations in a unified framework.
- Score: 10.679604514849744
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Offline imitation learning typically learns from expert and unlabeled demonstrations, yet often overlooks the valuable signal in explicitly undesirable behaviors. In this work, we study offline imitation learning from contrasting behaviors, where the dataset contains both expert and undesirable demonstrations. We propose a novel formulation that optimizes a difference of KL divergences over the state-action visitation distributions of expert and undesirable (or bad) data. Although the resulting objective is a DC (Difference-of-Convex) program, we prove that it becomes convex when expert demonstrations outweigh undesirable demonstrations, enabling a practical and stable non-adversarial training objective. Our method avoids adversarial training and handles both positive and negative demonstrations in a unified framework. Extensive experiments on standard offline imitation learning benchmarks demonstrate that our approach consistently outperforms state-of-the-art baselines.
Related papers
- Imitation Learning via Focused Satisficing [6.745370992941109]
Imitation learning assumes that demonstrations are close to optimal according to some fixed, but unknown, cost function.<n>We show experimentally that this focuses the policy to imitate the highest quality (portions of) demonstrations better than existing imitation learning methods.
arXiv Detail & Related papers (2025-05-20T18:36:52Z) - Beyond Non-Expert Demonstrations: Outcome-Driven Action Constraint for Offline Reinforcement Learning [17.601574372211232]
We address the challenge of offline reinforcement learning using realistic data, specifically non-expert data collected through sub-optimal behavior policies.<n>Under such circumstance, the learned policy must be safe enough to manage distribution shift while maintaining sufficient flexibility to deal with bad demonstrations from offline data.<n>We introduce a novel method called Outcome-Driven Action Flexibility (ODAF), which seeks to reduce reliance on the empirical action distribution of the behavior policy, hence reducing the negative impact of those bad demonstrations.
arXiv Detail & Related papers (2025-04-02T13:27:44Z) - Offline Imitation Learning with Model-based Reverse Augmentation [48.64791438847236]
We propose a novel model-based framework, called offline Imitation Learning with Self-paced Reverse Augmentation.
Specifically, we build a reverse dynamic model from the offline demonstrations, which can efficiently generate trajectories leading to the expert-observed states.
We use the subsequent reinforcement learning method to learn from the augmented trajectories and transit from expert-unobserved states to expert-observed states.
arXiv Detail & Related papers (2024-06-18T12:27:02Z) - Inverse-RLignment: Large Language Model Alignment from Demonstrations through Inverse Reinforcement Learning [62.05713042908654]
We introduce Alignment from Demonstrations (AfD), a novel approach leveraging high-quality demonstration data to overcome these challenges.<n>We formalize AfD within a sequential decision-making framework, highlighting its unique challenge of missing reward signals.<n> Practically, we propose a computationally efficient algorithm that extrapolates over a tailored reward model for AfD.
arXiv Detail & Related papers (2024-05-24T15:13:53Z) - A Simple Solution for Offline Imitation from Observations and Examples
with Possibly Incomplete Trajectories [122.11358440078581]
offline imitation is useful in real-world scenarios where arbitrary interactions are costly and expert actions are unavailable.
We propose Trajectory-Aware Learning from Observations (TAILO) to solve MDPs where only task-specific expert states and task-agnostic non-expert state-action pairs are available.
arXiv Detail & Related papers (2023-11-02T15:41:09Z) - Imitation Learning from Purified Demonstrations [47.52316615371601]
We propose to purify the potential noises in imperfect demonstrations first, and subsequently conduct imitation learning from these purified demonstrations.
We provide theoretical evidence supporting our approach, demonstrating that the distance between the purified and optimal demonstration can be bounded.
arXiv Detail & Related papers (2023-10-11T02:36:52Z) - Unlabeled Imperfect Demonstrations in Adversarial Imitation Learning [48.595574101874575]
In the real world, expert demonstrations are more likely to be imperfect.
A positive-unlabeled adversarial imitation learning algorithm is developed.
Agent policy will be optimized to cheat the discriminator and produce trajectories similar to those optimal expert demonstrations.
arXiv Detail & Related papers (2023-02-13T11:26:44Z) - Causal Imitation Learning with Unobserved Confounders [82.22545916247269]
We study imitation learning when sensory inputs of the learner and the expert differ.
We show that imitation could still be feasible by exploiting quantitative knowledge of the expert trajectories.
arXiv Detail & Related papers (2022-08-12T13:29:53Z) - Robust Imitation Learning from Corrupted Demonstrations [15.872598211059403]
We consider offline Imitation Learning from corrupted demonstrations where a constant fraction of data can be noise or even arbitrary outliers.
We propose a novel robust algorithm by minimizing a Median-of-Means (MOM) objective which guarantees the accurate estimation of policy.
Our experiments on continuous-control benchmarks validate that our method exhibits the predicted robustness and effectiveness.
arXiv Detail & Related papers (2022-01-29T14:21:28Z) - Unsupervised Embedding Learning from Uncertainty Momentum Modeling [37.674449317054716]
We propose a novel solution to explicitly model and explore the uncertainty of the given unlabeled learning samples.
We leverage such uncertainty modeling momentum to the learning which is helpful to tackle the outliers.
arXiv Detail & Related papers (2021-07-19T14:06:19Z) - A Sober Look at the Unsupervised Learning of Disentangled
Representations and their Evaluation [63.042651834453544]
We show that the unsupervised learning of disentangled representations is impossible without inductive biases on both the models and the data.
We observe that while the different methods successfully enforce properties "encouraged" by the corresponding losses, well-disentangled models seemingly cannot be identified without supervision.
Our results suggest that future work on disentanglement learning should be explicit about the role of inductive biases and (implicit) supervision.
arXiv Detail & Related papers (2020-10-27T10:17:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.