SPLASH! Sample-efficient Preference-based inverse reinforcement learning for Long-horizon Adversarial tasks from Suboptimal Hierarchical demonstrations
- URL: http://arxiv.org/abs/2507.08707v1
- Date: Fri, 11 Jul 2025 16:05:18 GMT
- Title: SPLASH! Sample-efficient Preference-based inverse reinforcement learning for Long-horizon Adversarial tasks from Suboptimal Hierarchical demonstrations
- Authors: Peter Crowley, Zachary Serlin, Tyler Paine, Makai Mann, Michael Benjamin, Calin Belta,
- Abstract summary: Inverse Reinforcement Learning presents a powerful paradigm for learning complex robotic tasks from human demonstrations.<n>We introduce Preference-based inverse reinforcement learning for Long-horizon Adversarial tasks from Suboptimal Hierarchical demonstrations (SPLASH)<n>We empirically validate SPLASH on a maritime capture-the-flag task in simulation, and demonstrate real-world applicability with sim-to-real translation experiments on autonomous surface vehicles.
- Score: 1.4793622723642046
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Inverse Reinforcement Learning (IRL) presents a powerful paradigm for learning complex robotic tasks from human demonstrations. However, most approaches make the assumption that expert demonstrations are available, which is often not the case. Those that allow for suboptimality in the demonstrations are not designed for long-horizon goals or adversarial tasks. Many desirable robot capabilities fall into one or both of these categories, thus highlighting a critical shortcoming in the ability of IRL to produce field-ready robotic agents. We introduce Sample-efficient Preference-based inverse reinforcement learning for Long-horizon Adversarial tasks from Suboptimal Hierarchical demonstrations (SPLASH), which advances the state-of-the-art in learning from suboptimal demonstrations to long-horizon and adversarial settings. We empirically validate SPLASH on a maritime capture-the-flag task in simulation, and demonstrate real-world applicability with sim-to-real translation experiments on autonomous unmanned surface vehicles. We show that our proposed methods allow SPLASH to significantly outperform the state-of-the-art in reward learning from suboptimal demonstrations.
Related papers
- Curating Demonstrations using Online Experience [52.59275477573012]
We show that Demo-SCORE can effectively identify suboptimal demonstrations without manual curation.<n>Demo-SCORE achieves over 15-35% higher absolute success rate in the resulting policy compared to the base policy trained with all original demonstrations.
arXiv Detail & Related papers (2025-03-05T17:58:16Z) - Subtask-Aware Visual Reward Learning from Segmented Demonstrations [97.80917991633248]
This paper introduces REDS: REward learning from Demonstration with Demonstrations, a novel reward learning framework.<n>We train a dense reward function conditioned on video segments and their corresponding subtasks to ensure alignment with ground-truth reward signals.<n>Our experiments show that REDS significantly outperforms baseline methods on complex robotic manipulation tasks in Meta-World.
arXiv Detail & Related papers (2025-02-28T01:25:37Z) - Toward Information Theoretic Active Inverse Reinforcement Learning [0.21990652930491852]
Inverse reinforcement learning (IRL) offers a promising approach to infer the unknown reward from demonstrations.<n>Active IRL addresses this challenge by strategically selecting the most informative scenarios for human demonstration.<n>We provide an information-theoretic acquisition function, propose an efficient approximation scheme, and illustrate its performance through a set of gridworld experiments.
arXiv Detail & Related papers (2024-12-31T10:32:24Z) - Imitation Learning from Suboptimal Demonstrations via Meta-Learning An Action Ranker [9.6508237676589]
A major bottleneck in imitation learning is the requirement of a large number of expert demonstrations.<n>We propose a novel approach named imitation learning via meta-learning an action ranker (ILMAR)<n>ILMAR implements weighted behavior cloning (weighted BC) on a limited set of expert demonstrations along with supplementary demonstrations.
arXiv Detail & Related papers (2024-12-28T16:06:44Z) - Make a Donut: Hierarchical EMD-Space Planning for Zero-Shot Deformable Manipulation with Tools [14.069149456110676]
We introduce a demonstration-free hierarchical planning approach capable of tackling intricate long-horizon tasks.<n>We employ large language models (LLMs) to articulate a high-level, stage-by-stage plan corresponding to a specified task.<n>We further substantiate our approach with experimental trials on real-world robotic platforms.
arXiv Detail & Related papers (2023-11-05T22:43:29Z) - Leveraging Demonstrations with Latent Space Priors [90.56502305574665]
We propose to leverage demonstration datasets by combining skill learning and sequence modeling.
We show how to acquire such priors from state-only motion capture demonstrations and explore several methods for integrating them into policy learning.
Our experimental results confirm that latent space priors provide significant gains in learning speed and final performance in a set of challenging sparse-reward environments.
arXiv Detail & Related papers (2022-10-26T13:08:46Z) - Self-Imitation Learning from Demonstrations [4.907551775445731]
Self-Imitation Learning exploits agent's past good experience to learn from suboptimal demonstrations.
We show that SILfD can learn from demonstrations that are noisy or far from optimal.
We also find SILfD superior to the existing state-of-the-art LfD algorithms in sparse environments.
arXiv Detail & Related papers (2022-03-21T11:56:56Z) - Bottom-Up Skill Discovery from Unsegmented Demonstrations for
Long-Horizon Robot Manipulation [55.31301153979621]
We tackle real-world long-horizon robot manipulation tasks through skill discovery.
We present a bottom-up approach to learning a library of reusable skills from unsegmented demonstrations.
Our method has shown superior performance over state-of-the-art imitation learning methods in multi-stage manipulation tasks.
arXiv Detail & Related papers (2021-09-28T16:18:54Z) - Learning from Imperfect Demonstrations from Agents with Varying Dynamics [29.94164262533282]
We develop a metric composed of a feasibility score and an optimality score to measure how useful a demonstration is for imitation learning.
Our experiments on four environments in simulation and on a real robot show improved learned policies with higher expected return.
arXiv Detail & Related papers (2021-03-10T07:39:38Z) - Demonstration-efficient Inverse Reinforcement Learning in Procedurally
Generated Environments [137.86426963572214]
Inverse Reinforcement Learning can extrapolate reward functions from expert demonstrations.
We show that our approach, DE-AIRL, is demonstration-efficient and still able to extrapolate reward functions which generalize to the fully procedural domain.
arXiv Detail & Related papers (2020-12-04T11:18:02Z) - Learning Sparse Rewarded Tasks from Sub-Optimal Demonstrations [78.94386823185724]
Imitation learning learns effectively in sparse-rewarded tasks by leveraging the existing expert demonstrations.
In practice, collecting a sufficient amount of expert demonstrations can be prohibitively expensive.
We propose Self-Adaptive Learning (SAIL) that can achieve (near) optimal performance given only a limited number of sub-optimal demonstrations.
arXiv Detail & Related papers (2020-04-01T15:57:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.