Towards Generalizable Zero-Shot Manipulation via Translating Human
Interaction Plans
- URL: http://arxiv.org/abs/2312.00775v1
- Date: Fri, 1 Dec 2023 18:54:12 GMT
- Title: Towards Generalizable Zero-Shot Manipulation via Translating Human
Interaction Plans
- Authors: Homanga Bharadhwaj, Abhinav Gupta, Vikash Kumar, Shubham Tulsiani
- Abstract summary: We show how passive human videos can serve as a rich source of data for learning such generalist robots.
We learn a human plan predictor that, given a current image of a scene and a goal image, predicts the future hand and object configurations.
We show that our learned system can perform over 16 manipulation skills that generalize to 40 objects.
- Score: 58.27029676638521
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We pursue the goal of developing robots that can interact zero-shot with
generic unseen objects via a diverse repertoire of manipulation skills and show
how passive human videos can serve as a rich source of data for learning such
generalist robots. Unlike typical robot learning approaches which directly
learn how a robot should act from interaction data, we adopt a factorized
approach that can leverage large-scale human videos to learn how a human would
accomplish a desired task (a human plan), followed by translating this plan to
the robots embodiment. Specifically, we learn a human plan predictor that,
given a current image of a scene and a goal image, predicts the future hand and
object configurations. We combine this with a translation module that learns a
plan-conditioned robot manipulation policy, and allows following humans plans
for generic manipulation tasks in a zero-shot manner with no deployment-time
training. Importantly, while the plan predictor can leverage large-scale human
videos for learning, the translation module only requires a small amount of
in-domain data, and can generalize to tasks not seen during training. We show
that our learned system can perform over 16 manipulation skills that generalize
to 40 objects, encompassing 100 real-world tasks for table-top manipulation and
diverse in-the-wild manipulation. https://homangab.github.io/hopman/
Related papers
- Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation [65.46610405509338]
We seek to learn a generalizable goal-conditioned policy that enables zero-shot robot manipulation.
Our framework,Track2Act predicts tracks of how points in an image should move in future time-steps based on a goal.
We show that this approach of combining scalably learned track prediction with a residual policy enables diverse generalizable robot manipulation.
arXiv Detail & Related papers (2024-05-02T17:56:55Z) - Open-World Object Manipulation using Pre-trained Vision-Language Models [72.87306011500084]
For robots to follow instructions from people, they must be able to connect the rich semantic information in human vocabulary.
We develop a simple approach, which leverages a pre-trained vision-language model to extract object-identifying information.
In a variety of experiments on a real mobile manipulator, we find that MOO generalizes zero-shot to a wide range of novel object categories and environments.
arXiv Detail & Related papers (2023-03-02T01:55:10Z) - Scaling Robot Learning with Semantically Imagined Experience [21.361979238427722]
Recent advances in robot learning have shown promise in enabling robots to perform manipulation tasks.
One of the key contributing factors to this progress is the scale of robot data used to train the models.
We propose an alternative route and leverage text-to-image foundation models widely used in computer vision and natural language processing.
arXiv Detail & Related papers (2023-02-22T18:47:51Z) - Zero-Shot Robot Manipulation from Passive Human Videos [59.193076151832145]
We develop a framework for extracting agent-agnostic action representations from human videos.
Our framework is based on predicting plausible human hand trajectories.
We deploy the trained model zero-shot for physical robot manipulation tasks.
arXiv Detail & Related papers (2023-02-03T21:39:52Z) - Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies.
The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z) - Learning Predictive Models From Observation and Interaction [137.77887825854768]
Learning predictive models from interaction with the world allows an agent, such as a robot, to learn about how the world works.
However, learning a model that captures the dynamics of complex skills represents a major challenge.
We propose a method to augment the training set with observational data of other agents, such as humans.
arXiv Detail & Related papers (2019-12-30T01:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.