Related papers: Bootstrapping Imitation Learning for Long-horizon Manipulation via Hierarchical Data Collection Space

Bootstrapping Imitation Learning for Long-horizon Manipulation via Hierarchical Data Collection Space

URL: http://arxiv.org/abs/2505.17389v1
Date: Fri, 23 May 2025 01:57:45 GMT
Title: Bootstrapping Imitation Learning for Long-horizon Manipulation via Hierarchical Data Collection Space
Authors: Jinrong Yang, Kexun Chen, Zhuoling Li, Shengkai Wu, Yong Zhao, Liangliang Ren, Wenqiu Luo, Chaohui Shang, Meiyu Zhi, Linfeng Gao, Mingshan Sun, Hui Cheng,
Abstract summary: Imitation learning (IL) with human demonstrations is a promising method for robotic manipulation tasks.<n>We introduce a Hierarchical Data Collection Space (HD-Space) for robotic imitation learning, a simple data collection scheme.<n>We conduct empirical evaluations across two simulated and five real-world long-horizon manipulation tasks.
Score: 16.787049521081983
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Imitation learning (IL) with human demonstrations is a promising method for robotic manipulation tasks. While minimal demonstrations enable robotic action execution, achieving high success rates and generalization requires high cost, e.g., continuously adding data or incrementally conducting human-in-loop processes with complex hardware/software systems. In this paper, we rethink the state/action space of the data collection pipeline as well as the underlying factors responsible for the prediction of non-robust actions. To this end, we introduce a Hierarchical Data Collection Space (HD-Space) for robotic imitation learning, a simple data collection scheme, endowing the model to train with proactive and high-quality data. Specifically, We segment the fine manipulation task into multiple key atomic tasks from a high-level perspective and design atomic state/action spaces for human demonstrations, aiming to generate robust IL data. We conduct empirical evaluations across two simulated and five real-world long-horizon manipulation tasks and demonstrate that IL policy training with HD-Space-based data can achieve significantly enhanced policy performance. HD-Space allows the use of a small amount of demonstration data to train a more powerful policy, particularly for long-horizon manipulation tasks. We aim for HD-Space to offer insights into optimizing data quality and guiding data scaling. project page: https://hd-space-robotics.github.io.

Related papers

Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos [66.62109400603394]
We introduce Being-H0, a dexterous Vision-Language-Action model trained on large-scale human videos.<n>Our approach centers on physical instruction tuning, a novel training paradigm that combines large-scale VLA pretraining from human videos, physical space alignment for 3D reasoning, and post-training adaptation for robotic tasks.<n>We empirically show the excellence of Being-H0 in hand motion generation and instruction following, and it also scales well with model and data sizes.
arXiv Detail & Related papers (2025-07-21T13:19:09Z)
HumanoidGen: Data Generation for Bimanual Dexterous Manipulation via LLM Reasoning [46.57163859424286]
This paper presents HumanoidGen, an automated task creation and demonstration collection framework.<n>Specifically, we provide spatial annotations for both assets and dexterous hands based on the atomic operations.<n>In experiments, we create a novel benchmark with augmented scenarios to evaluate the quality of the collected data.
arXiv Detail & Related papers (2025-07-01T15:04:38Z)
λ: A Benchmark for Data-Efficiency in Long-Horizon Indoor Mobile Manipulation Robotics [11.901933884058021]
We introduce the LAMBDA benchmark-Long-horizon Actions for Mobile-manipulation Benchmarking of Directed Activities.<n>Our benchmark includes 571 human-collected demonstrations that provide realism and diversity in simulated and real-world settings.<n>We find that learning methods, even when pretrained, yield lower success rates, while a neuro-symbolic method performs significantly better and requires less data.
arXiv Detail & Related papers (2024-11-28T19:31:50Z)
ManiBox: Enhancing Spatial Grasping Generalization via Scalable Simulation Data Generation [37.73074657448699]
bfManiBox is a novel bounding-box-guided manipulation method built on a simulation-based teacher-student framework.<n>ManiBox demonstrates a marked improvement in spatial grasping generalization and adaptability to diverse objects and backgrounds.
arXiv Detail & Related papers (2024-11-04T07:05:02Z)
D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments. Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z)
JUICER: Data-Efficient Imitation Learning for Robotic Assembly [21.43402768760014]
This paper proposes a pipeline for improving imitation learning performance with a small human demonstration budget. Our pipeline combines expressive policy architectures and various techniques for dataset expansion and simulation-based data augmentation. We demonstrate our pipeline on four furniture assembly tasks in simulation, enabling a manipulator to assemble up to five parts over nearly 2500 time steps.
arXiv Detail & Related papers (2024-04-04T18:00:15Z)
Any-point Trajectory Modeling for Policy Learning [64.23861308947852]
We introduce Any-point Trajectory Modeling (ATM) to predict future trajectories of arbitrary points within a video frame. ATM outperforms strong video pre-training baselines by 80% on average. We show effective transfer learning of manipulation skills from human videos and videos from a different robot morphology.
arXiv Detail & Related papers (2023-12-28T23:34:43Z)
STAR: Boosting Low-Resource Information Extraction by Structure-to-Text Data Generation with Large Language Models [56.27786433792638]
STAR is a data generation method that leverages Large Language Models (LLMs) to synthesize data instances. We design fine-grained step-by-step instructions to obtain the initial data instances. Our experiments show that the data generated by STAR significantly improve the performance of low-resource event extraction and relation extraction tasks.
arXiv Detail & Related papers (2023-05-24T12:15:19Z)
Leveraging Demonstrations with Latent Space Priors [90.56502305574665]
We propose to leverage demonstration datasets by combining skill learning and sequence modeling. We show how to acquire such priors from state-only motion capture demonstrations and explore several methods for integrating them into policy learning. Our experimental results confirm that latent space priors provide significant gains in learning speed and final performance in a set of challenging sparse-reward environments.
arXiv Detail & Related papers (2022-10-26T13:08:46Z)
Batch Exploration with Examples for Scalable Robotic Reinforcement Learning [63.552788688544254]
Batch Exploration with Examples (BEE) explores relevant regions of the state-space guided by a modest number of human provided images of important states. BEE is able to tackle challenging vision-based manipulation tasks both in simulation and on a real Franka robot.
arXiv Detail & Related papers (2020-10-22T17:49:25Z)
Low Dimensional State Representation Learning with Reward-shaped Priors [7.211095654886105]
We propose a method that aims at learning a mapping from the observations into a lower-dimensional state space. This mapping is learned with unsupervised learning using loss functions shaped to incorporate prior knowledge of the environment and the task. We test the method on several mobile robot navigation tasks in a simulation environment and also on a real robot.
arXiv Detail & Related papers (2020-07-29T13:00:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.