Generalization Through Hand-Eye Coordination: An Action Space for
Learning Spatially-Invariant Visuomotor Control
- URL: http://arxiv.org/abs/2103.00375v1
- Date: Sun, 28 Feb 2021 01:49:13 GMT
- Title: Generalization Through Hand-Eye Coordination: An Action Space for
Learning Spatially-Invariant Visuomotor Control
- Authors: Chen Wang, Rui Wang, Danfei Xu, Ajay Mandlekar, Li Fei-Fei, Silvio
Savarese
- Abstract summary: Imitation Learning (IL) is an effective framework to learn visuomotor skills from offline demonstration data.
Hand-eye Action Networks (HAN) can approximate human's hand-eye coordination behaviors by learning from human teleoperated demonstrations.
- Score: 67.23580984118479
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Imitation Learning (IL) is an effective framework to learn visuomotor skills
from offline demonstration data. However, IL methods often fail to generalize
to new scene configurations not covered by training data. On the other hand,
humans can manipulate objects in varying conditions. Key to such capability is
hand-eye coordination, a cognitive ability that enables humans to adaptively
direct their movements at task-relevant objects and be invariant to the
objects' absolute spatial location. In this work, we present a learnable action
space, Hand-eye Action Networks (HAN), that can approximate human's hand-eye
coordination behaviors by learning from human teleoperated demonstrations.
Through a set of challenging multi-stage manipulation tasks, we show that a
visuomotor policy equipped with HAN is able to inherit the key spatial
invariance property of hand-eye coordination and achieve zero-shot
generalization to new scene configurations. Additional materials available at
https://sites.google.com/stanford.edu/han
Related papers
- Zero-Cost Whole-Body Teleoperation for Mobile Manipulation [8.71539730969424]
MoMa-Teleop is a novel teleoperation method that delegates the base motions to a reinforcement learning agent.
We demonstrate that our approach results in a significant reduction in task completion time across a variety of robots and tasks.
arXiv Detail & Related papers (2024-09-23T15:09:45Z) - Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning [21.944363082061333]
We propose textbfManiwhere, a generalizable framework tailored for visual reinforcement learning.
Our experiments show that Maniwhere significantly outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2024-07-22T17:29:02Z) - Revisit Human-Scene Interaction via Space Occupancy [55.67657438543008]
Human-scene Interaction (HSI) generation is a challenging task and crucial for various downstream tasks.
In this work, we argue that interaction with a scene is essentially interacting with the space occupancy of the scene from an abstract physical perspective.
By treating pure motion sequences as records of humans interacting with invisible scene occupancy, we can aggregate motion-only data into a large-scale paired human-occupancy interaction database.
arXiv Detail & Related papers (2023-12-05T12:03:00Z) - Human-oriented Representation Learning for Robotic Manipulation [64.59499047836637]
Humans inherently possess generalizable visual representations that empower them to efficiently explore and interact with the environments in manipulation tasks.
We formalize this idea through the lens of human-oriented multi-task fine-tuning on top of pre-trained visual encoders.
Our Task Fusion Decoder consistently improves the representation of three state-of-the-art visual encoders for downstream manipulation policy-learning.
arXiv Detail & Related papers (2023-10-04T17:59:38Z) - Object Motion Guided Human Motion Synthesis [22.08240141115053]
We study the problem of full-body human motion synthesis for the manipulation of large-sized objects.
We propose Object MOtion guided human MOtion synthesis (OMOMO), a conditional diffusion framework.
We develop a novel system that captures full-body human manipulation motions by simply attaching a smartphone to the object being manipulated.
arXiv Detail & Related papers (2023-09-28T08:22:00Z) - Synthesizing Diverse Human Motions in 3D Indoor Scenes [16.948649870341782]
We present a novel method for populating 3D indoor scenes with virtual humans that can navigate in the environment and interact with objects in a realistic manner.
Existing approaches rely on training sequences that contain captured human motions and the 3D scenes they interact with.
We propose a reinforcement learning-based approach that enables virtual humans to navigate in 3D scenes and interact with objects realistically and autonomously.
arXiv Detail & Related papers (2023-05-21T09:22:24Z) - Synthesizing Physical Character-Scene Interactions [64.26035523518846]
It is necessary to synthesize such interactions between virtual characters and their surroundings.
We present a system that uses adversarial imitation learning and reinforcement learning to train physically-simulated characters.
Our approach takes physics-based character motion generation a step closer to broad applicability.
arXiv Detail & Related papers (2023-02-02T05:21:32Z) - Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies.
The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z) - ManiSkill: Learning-from-Demonstrations Benchmark for Generalizable
Manipulation Skills [27.214053107733186]
We propose SAPIEN Manipulation Skill Benchmark (abbreviated as ManiSkill) for learning generalizable object manipulation skills.
ManiSkill supports object-level variations by utilizing a rich and diverse set of articulated objects.
ManiSkill can encourage the robot learning community to explore more on learning generalizable object manipulation skills.
arXiv Detail & Related papers (2021-07-30T08:20:22Z) - Learning Dexterous Grasping with Object-Centric Visual Affordances [86.49357517864937]
Dexterous robotic hands are appealing for their agility and human-like morphology.
We introduce an approach for learning dexterous grasping.
Our key idea is to embed an object-centric visual affordance model within a deep reinforcement learning loop.
arXiv Detail & Related papers (2020-09-03T04:00:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.