Related papers: Value from Observations: Towards Large-Scale Imitation Learning via Self-Improvement

Value from Observations: Towards Large-Scale Imitation Learning via Self-Improvement

URL: http://arxiv.org/abs/2507.06701v1
Date: Wed, 09 Jul 2025 09:55:23 GMT
Title: Value from Observations: Towards Large-Scale Imitation Learning via Self-Improvement
Authors: Michael Bloesch, Markus Wulfmeier, Philemon Brakel, Todor Davchev, Martina Zambelli, Jost Tobias Springenberg, Abbas Abdolmaleki, William F Whitney, Nicolas Heess, Roland Hafner, Martin Riedmiller,
Abstract summary: Imitation Learning from Observation (IfO) offers a powerful way to learn behaviors at large-scale.<n>This paper investigates idealized scenarios with mostly bimodal-quality data distributions and introduces a method to learn from such data.<n>Our method adapts RL-based imitation learning to action-free demonstrations, using a value function to transfer information between expert and non-expert data.
Score: 19.883973457999282
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Imitation Learning from Observation (IfO) offers a powerful way to learn behaviors at large-scale: Unlike behavior cloning or offline reinforcement learning, IfO can leverage action-free demonstrations and thus circumvents the need for costly action-labeled demonstrations or reward functions. However, current IfO research focuses on idealized scenarios with mostly bimodal-quality data distributions, restricting the meaningfulness of the results. In contrast, this paper investigates more nuanced distributions and introduces a method to learn from such data, moving closer to a paradigm in which imitation learning can be performed iteratively via self-improvement. Our method adapts RL-based imitation learning to action-free demonstrations, using a value function to transfer information between expert and non-expert data. Through comprehensive evaluation, we delineate the relation between different data distributions and the applicability of algorithms and highlight the limitations of established methods. Our findings provide valuable insights for developing more robust and practical IfO techniques on a path to scalable behaviour learning.

Related papers

LLM Unlearning Under the Microscope: A Full-Stack View on Methods and Metrics [10.638045151201084]
We present a principled taxonomy of twelve recent stateful unlearning methods.<n>We revisit the evaluation of unlearning effectiveness (UE), utility retention (UT), and robustness (Rob)<n>Our analysis shows that current evaluations, dominated by multiple-choice question (MCQ) accuracy, offer only a narrow perspective.
arXiv Detail & Related papers (2025-10-08T23:47:05Z)
Efficient Machine Unlearning via Influence Approximation [75.31015485113993]
Influence-based unlearning has emerged as a prominent approach to estimate the impact of individual training samples on model parameters without retraining.<n>This paper establishes a theoretical link between memorizing (incremental learning) and forgetting (unlearning)<n>We introduce the Influence Approximation Unlearning algorithm for efficient machine unlearning from the incremental perspective.
arXiv Detail & Related papers (2025-07-31T05:34:27Z)
Towards Effective Evaluations and Comparisons for LLM Unlearning Methods [97.2995389188179]
This paper seeks to refine the evaluation of machine unlearning for large language models.<n>It addresses two key challenges -- the robustness of evaluation metrics and the trade-offs between competing goals.
arXiv Detail & Related papers (2024-06-13T14:41:00Z)
RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning. Our proposed method uses reinforcement learning with user intervention signals themselves as rewards. This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z)
Variational Self-Supervised Contrastive Learning Using Beta Divergence [0.0]
We present a contrastive self-supervised learning method which is robust to data noise, grounded in the domain of variational methods. We demonstrate the effectiveness of the proposed method through rigorous experiments including linear evaluation and fine-tuning scenarios with multi-label datasets in the face understanding domain.
arXiv Detail & Related papers (2023-09-05T17:21:38Z)
Reinforcement Learning from Passive Data via Latent Intentions [86.4969514480008]
We show that passive data can still be used to learn features that accelerate downstream RL. Our approach learns from passive data by modeling intentions. Our experiments demonstrate the ability to learn from many forms of passive data, including cross-embodiment video data and YouTube videos.
arXiv Detail & Related papers (2023-04-10T17:59:05Z)
ArCL: Enhancing Contrastive Learning with Augmentation-Robust Representations [30.745749133759304]
We develop a theoretical framework to analyze the transferability of self-supervised contrastive learning. We show that contrastive learning fails to learn domain-invariant features, which limits its transferability. Based on these theoretical insights, we propose a novel method called Augmentation-robust Contrastive Learning (ArCL)
arXiv Detail & Related papers (2023-03-02T09:26:20Z)
Implicit Offline Reinforcement Learning via Supervised Learning [83.8241505499762]
Offline Reinforcement Learning (RL) via Supervised Learning is a simple and effective way to learn robotic skills from a dataset collected by policies of different expertise levels. We show how implicit models can leverage return information and match or outperform explicit algorithms to acquire robotic skills from fixed datasets.
arXiv Detail & Related papers (2022-10-21T21:59:42Z)
TRAIL: Near-Optimal Imitation Learning with Suboptimal Data [100.83688818427915]
We present training objectives that use offline datasets to learn a factored transition model. Our theoretical analysis shows that the learned latent action space can boost the sample-efficiency of downstream imitation learning. To learn the latent action space in practice, we propose TRAIL (Transition-Reparametrized Actions for Imitation Learning), an algorithm that learns an energy-based transition model.
arXiv Detail & Related papers (2021-10-27T21:05:00Z)
DEALIO: Data-Efficient Adversarial Learning for Imitation from Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator. Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms. This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk. We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.