Related papers: GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation, Demonstration, and Imitation

GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation, Demonstration, and Imitation

URL: http://arxiv.org/abs/2401.00929v2
Date: Fri, 14 Jun 2024 14:10:19 GMT
Title: GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation, Demonstration, and Imitation
Authors: Zifan Wang, Junyu Chen, Ziqing Chen, Pengwei Xie, Rui Chen, Li Yi,
Abstract summary: GenH2R is a framework for learning generalizable vision-based human-to-robot (H2R) handover skills. We acquire such generalizability by learning H2R handover at scale with a comprehensive solution. We leverage large-scale 3D model repositories, dexterous grasp generation methods, and curve-based 3D animation.
Score: 31.702907860448477
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents GenH2R, a framework for learning generalizable vision-based human-to-robot (H2R) handover skills. The goal is to equip robots with the ability to reliably receive objects with unseen geometry handed over by humans in various complex trajectories. We acquire such generalizability by learning H2R handover at scale with a comprehensive solution including procedural simulation assets creation, automated demonstration generation, and effective imitation learning. We leverage large-scale 3D model repositories, dexterous grasp generation methods, and curve-based 3D animation to create an H2R handover simulation environment named \simabbns, surpassing the number of scenes in existing simulators by three orders of magnitude. We further introduce a distillation-friendly demonstration generation method that automatically generates a million high-quality demonstrations suitable for learning. Finally, we present a 4D imitation learning method augmented by a future forecasting objective to distill demonstrations into a visuo-motor handover policy. Experimental evaluations in both simulators and the real world demonstrate significant improvements (at least +10\% success rate) over baselines in all cases. The project page is https://GenH2R.github.io/.

Related papers

RoboPearls: Editable Video Simulation for Robot Manipulation [81.18434338506621]
RoboPearls is an editable video simulation framework for robotic manipulation.<n>Built on 3D Gaussian Splatting (3DGS), RoboPearls enables the construction of photo-realistic, view-consistent simulations.<n>We conduct extensive experiments on multiple datasets and scenes, including RLBench, COLOSSEUM, Ego4D, Open X-Embodiment, and a real-world robot.
arXiv Detail & Related papers (2025-06-28T05:03:31Z)
Object-centric 3D Motion Field for Robot Learning from Human Videos [56.9436352861611]
We propose to use object-centric 3D motion field to represent actions for robot learning from human videos.<n>We present a novel framework for extracting this representation from videos for zero-shot control.<n> Experiments show that our method reduces 3D motion estimation error by over 50% compared to the latest method.
arXiv Detail & Related papers (2025-06-04T17:59:06Z)
VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation [53.63540587160549]
VidBot is a framework enabling zero-shot robotic manipulation using learned 3D affordance from in-the-wild monocular RGB-only human videos. VidBot paves the way for leveraging everyday human videos to make robot learning more scalable.
arXiv Detail & Related papers (2025-03-10T10:04:58Z)
ArticuBot: Learning Universal Articulated Object Manipulation Policy via Large Scale Simulation [22.43711565969091]
Articubot is a system that learns a policy to open diverse categories of unseen articulated objects in the real world. We show that our learned policy can zero-shot transfer to three different real robot settings.
arXiv Detail & Related papers (2025-03-04T22:51:50Z)
Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation [62.5805866419814]
Vid2Sim is a novel framework that bridges the sim2real gap through a scalable and cost-efficient real2sim pipeline for neural 3D scene reconstruction and simulation. Experiments demonstrate that Vid2Sim significantly improves the performance of urban navigation in the digital twins and real world by 31.2% and 68.3% in success rate.
arXiv Detail & Related papers (2025-01-12T03:01:15Z)
Robot Learning with Super-Linear Scaling [20.730206708381704]
CASHER is a pipeline for scaling up data collection and learning in simulation where the performance scales superlinearly with human effort. We show that CASHER enables fine-tuning of pre-trained policies to a target scenario using a video scan without any additional human effort.
arXiv Detail & Related papers (2024-12-02T18:12:02Z)
Robot See Robot Do: Imitating Articulated Object Manipulation with Monocular 4D Reconstruction [51.49400490437258]
This work develops a method for imitating articulated object manipulation from a single monocular RGB human demonstration. We first propose 4D Differentiable Part Models (4D-DPM), a method for recovering 3D part motion from a monocular video. Given this 4D reconstruction, the robot replicates object trajectories by planning bimanual arm motions that induce the demonstrated object part motion. We evaluate 4D-DPM's 3D tracking accuracy on ground truth annotated 3D part trajectories and RSRD's physical execution performance on 9 objects across 10 trials each on a bimanual YuMi robot.
arXiv Detail & Related papers (2024-09-26T17:57:16Z)
DemoStart: Demonstration-led auto-curriculum applied to sim-to-real with multi-fingered robots [15.034811470942962]
We present DemoStart, a novel auto-curriculum reinforcement learning method capable of learning complex manipulation behaviors on an arm equipped with a three-fingered robotic hand. Learning from simulation drastically reduces the development cycle of behavior generation, and domain randomization techniques are leveraged to achieve successful zero-shot sim-to-real transfer.
arXiv Detail & Related papers (2024-09-10T16:05:25Z)
OmniRe: Omni Urban Scene Reconstruction [78.99262488964423]
We introduce OmniRe, a comprehensive system for creating high-fidelity digital twins of dynamic real-world scenes from on-device logs. Our approach builds scene graphs on 3DGS and constructs multiple Gaussian representations in canonical spaces that model various dynamic actors.
arXiv Detail & Related papers (2024-08-29T17:56:33Z)
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots [25.650235551519952]
We present RoboCasa, a large-scale simulation framework for training generalist robots in everyday environments. We provide thousands of 3D assets across over 150 object categories and dozens of interactable furniture and appliances. Our experiments show a clear scaling trend in using synthetically generated robot data for large-scale imitation learning.
arXiv Detail & Related papers (2024-06-04T17:41:31Z)
DiffGen: Robot Demonstration Generation via Differentiable Physics Simulation, Differentiable Rendering, and Vision-Language Model [72.66465487508556]
DiffGen is a novel framework that integrates differentiable physics simulation, differentiable rendering, and a vision-language model. It can generate realistic robot demonstrations by minimizing the distance between the embedding of the language instruction and the embedding of the simulated observation. Experiments demonstrate that with DiffGen, we could efficiently and effectively generate robot data with minimal human effort or training time.
arXiv Detail & Related papers (2024-05-12T15:38:17Z)
Part-Guided 3D RL for Sim2Real Articulated Object Manipulation [27.422878372169805]
We propose a part-guided 3D RL framework, which can learn to manipulate articulated objects without demonstrations. We combine the strengths of 2D segmentation and 3D RL to improve the efficiency of RL policy training. A single versatile RL policy can be trained on multiple articulated object manipulation tasks simultaneously in simulation.
arXiv Detail & Related papers (2024-04-26T10:18:17Z)
Gen2Sim: Scaling up Robot Learning in Simulation with Generative Models [17.757495961816783]
Gen2Sim is a method for scaling up robot skill learning in simulation by automating generation of 3D assets, task descriptions, task decompositions and reward functions. Our work contributes hundreds of simulated assets, tasks and demonstrations, taking a step towards fully autonomous robotic manipulation skill acquisition in simulation.
arXiv Detail & Related papers (2023-10-27T17:55:32Z)
Learning Interactive Real-World Simulators [96.5991333400566]
We explore the possibility of learning a universal simulator of real-world interaction through generative modeling. We use the simulator to train both high-level vision-language policies and low-level reinforcement learning policies. Video captioning models can benefit from training with simulated experience, opening up even wider applications.
arXiv Detail & Related papers (2023-10-09T19:42:22Z)
GINA-3D: Learning to Generate Implicit Neural Assets in the Wild [38.51391650845503]
GINA-3D is a generative model that uses real-world driving data from camera and LiDAR sensors to create 3D implicit neural assets of diverse vehicles and pedestrians. We construct a large-scale object-centric dataset containing over 1.2M images of vehicles and pedestrians. We demonstrate that it achieves state-of-the-art performance in quality and diversity for both generated images and geometries.
arXiv Detail & Related papers (2023-04-04T23:41:20Z)
Robot Learning from Randomized Simulations: A Review [59.992761565399185]
Deep learning has caused a paradigm shift in robotics research, favoring methods that require large amounts of data. State-of-the-art approaches learn in simulation where data generation is fast as well as inexpensive. We focus on a technique named 'domain randomization' which is a method for learning from randomized simulations.
arXiv Detail & Related papers (2021-11-01T13:55:41Z)
IGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday Household Tasks [60.930678878024366]
We present iGibson 2.0, a simulation environment that supports the simulation of a more diverse set of household tasks. First, iGibson 2.0 supports object states, including temperature, wetness level, cleanliness level, and toggled and sliced states. Second, iGibson 2.0 implements a set of predicate logic functions that map the simulator states to logic states like Cooked or Soaked. Third, iGibson 2.0 includes a virtual reality (VR) interface to immerse humans in its scenes to collect demonstrations.
arXiv Detail & Related papers (2021-08-06T18:41:39Z)
iGibson, a Simulation Environment for Interactive Tasks in Large Realistic Scenes [54.04456391489063]
iGibson is a novel simulation environment to develop robotic solutions for interactive tasks in large-scale realistic scenes. Our environment contains fifteen fully interactive home-sized scenes populated with rigid and articulated objects. iGibson features enable the generalization of navigation agents, and that the human-iGibson interface and integrated motion planners facilitate efficient imitation learning of simple human demonstrated behaviors.
arXiv Detail & Related papers (2020-12-05T02:14:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.