Imitation Learning via Focused Satisficing
- URL: http://arxiv.org/abs/2505.14820v2
- Date: Sun, 25 May 2025 16:55:31 GMT
- Title: Imitation Learning via Focused Satisficing
- Authors: Rushit N. Shah, Nikolaos Agadakos, Synthia Sasulski, Ali Farajzadeh, Sanjiban Choudhury, Brian Ziebart,
- Abstract summary: Imitation learning assumes that demonstrations are close to optimal according to some fixed, but unknown, cost function.<n>We show experimentally that this focuses the policy to imitate the highest quality (portions of) demonstrations better than existing imitation learning methods.
- Score: 6.745370992941109
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Imitation learning often assumes that demonstrations are close to optimal according to some fixed, but unknown, cost function. However, according to satisficing theory, humans often choose acceptable behavior based on their personal (and potentially dynamic) levels of aspiration, rather than achieving (near-) optimality. For example, a lunar lander demonstration that successfully lands without crashing might be acceptable to a novice despite being slow or jerky. Using a margin-based objective to guide deep reinforcement learning, our focused satisficing approach to imitation learning seeks a policy that surpasses the demonstrator's aspiration levels -- defined over trajectories or portions of trajectories -- on unseen demonstrations without explicitly learning those aspirations. We show experimentally that this focuses the policy to imitate the highest quality (portions of) demonstrations better than existing imitation learning methods, providing much higher rates of guaranteed acceptability to the demonstrator, and competitive true returns on a range of environments.
Related papers
- Learning What to Do and What Not To Do: Offline Imitation from Expert and Undesirable Demonstrations [10.679604514849744]
We study offline imitation learning from contrasting behaviors, where the dataset contains both expert and undesirable demonstrations.<n>We propose a novel formulation that optimize a difference of KL divergences over the state-action visitation distributions of expert and undesirable (or bad) data.<n>Our method avoids adversarial training and handles both positive and negative demonstrations in a unified framework.
arXiv Detail & Related papers (2025-05-27T13:33:21Z) - Imitation Learning from Suboptimal Demonstrations via Meta-Learning An Action Ranker [9.6508237676589]
A major bottleneck in imitation learning is the requirement of a large number of expert demonstrations.<n>We propose a novel approach named imitation learning via meta-learning an action ranker (ILMAR)<n>ILMAR implements weighted behavior cloning (weighted BC) on a limited set of expert demonstrations along with supplementary demonstrations.
arXiv Detail & Related papers (2024-12-28T16:06:44Z) - Zero-Shot Offline Imitation Learning via Optimal Transport [21.548195072895517]
Zero-shot imitation learning algorithms reproduce unseen behavior from as little as a single demonstration at test time.<n>Existing practical approaches view the expert demonstration as a sequence of goals, enabling imitation with a high-level goal selector, and a low-level goal-conditioned policy.<n>We introduce a novel method that mitigates this issue by directly optimizing the occupancy matching objective that is intrinsic to imitation learning.
arXiv Detail & Related papers (2024-10-11T12:10:51Z) - Imitation Learning from Purified Demonstrations [47.52316615371601]
We propose to purify the potential noises in imperfect demonstrations first, and subsequently conduct imitation learning from these purified demonstrations.
We provide theoretical evidence supporting our approach, demonstrating that the distance between the purified and optimal demonstration can be bounded.
arXiv Detail & Related papers (2023-10-11T02:36:52Z) - Skill Disentanglement for Imitation Learning from Suboptimal
Demonstrations [60.241144377865716]
We consider the imitation of sub-optimal demonstrations, with both a small clean demonstration set and a large noisy set.
We propose method by evaluating and imitating at the sub-demonstration level, encoding action primitives of varying quality into different skills.
arXiv Detail & Related papers (2023-06-13T17:24:37Z) - Unlabeled Imperfect Demonstrations in Adversarial Imitation Learning [48.595574101874575]
In the real world, expert demonstrations are more likely to be imperfect.
A positive-unlabeled adversarial imitation learning algorithm is developed.
Agent policy will be optimized to cheat the discriminator and produce trajectories similar to those optimal expert demonstrations.
arXiv Detail & Related papers (2023-02-13T11:26:44Z) - Causal Imitation Learning with Unobserved Confounders [82.22545916247269]
We study imitation learning when sensory inputs of the learner and the expert differ.
We show that imitation could still be feasible by exploiting quantitative knowledge of the expert trajectories.
arXiv Detail & Related papers (2022-08-12T13:29:53Z) - Imitating Past Successes can be Very Suboptimal [145.70788608016755]
We show that existing outcome-conditioned imitation learning methods do not necessarily improve the policy.
We show that a simple modification results in a method that does guarantee policy improvement.
Our aim is not to develop an entirely new method, but rather to explain how a variant of outcome-conditioned imitation learning can be used to maximize rewards.
arXiv Detail & Related papers (2022-06-07T15:13:43Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - Learning from Imperfect Demonstrations via Adversarial Confidence
Transfer [44.14553613304978]
We study the problem of learning from imperfect demonstrations by learning a confidence predictor.
We learn a common latent space through adversarial distribution matching of multi-length partial trajectories.
Our experiments in three simulated environments and a real robot reaching task demonstrate that our approach learns a policy with the highest expected return.
arXiv Detail & Related papers (2022-02-07T06:33:35Z) - Learning from Imperfect Demonstrations from Agents with Varying Dynamics [29.94164262533282]
We develop a metric composed of a feasibility score and an optimality score to measure how useful a demonstration is for imitation learning.
Our experiments on four environments in simulation and on a real robot show improved learned policies with higher expected return.
arXiv Detail & Related papers (2021-03-10T07:39:38Z) - State-Only Imitation Learning for Dexterous Manipulation [63.03621861920732]
In this paper, we explore state-only imitation learning.
We train an inverse dynamics model and use it to predict actions for state-only demonstrations.
Our method performs on par with state-action approaches and considerably outperforms RL alone.
arXiv Detail & Related papers (2020-04-07T17:57:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.