ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos
- URL: http://arxiv.org/abs/2404.15709v2
- Date: Sun, 22 Sep 2024 22:44:51 GMT
- Title: ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos
- Authors: Zerui Chen, Shizhe Chen, Etienne Arlaud, Ivan Laptev, Cordelia Schmid,
- Abstract summary: We propose ViViDex to improve vision-based policy learning from human videos.
It first uses reinforcement learning with trajectory guided rewards to train state-based policies for each video.
We then rollout successful episodes from state-based policies and train a unified visual policy without using any privileged information.
- Score: 81.99559944822752
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we aim to learn a unified vision-based policy for multi-fingered robot hands to manipulate a variety of objects in diverse poses. Though prior work has shown benefits of using human videos for policy learning, performance gains have been limited by the noise in estimated trajectories. Moreover, reliance on privileged object information such as ground-truth object states further limits the applicability in realistic scenarios. To address these limitations, we propose a new framework ViViDex to improve vision-based policy learning from human videos. It first uses reinforcement learning with trajectory guided rewards to train state-based policies for each video, obtaining both visually natural and physically plausible trajectories from the video. We then rollout successful episodes from state-based policies and train a unified visual policy without using any privileged information. We propose coordinate transformation to further enhance the visual point cloud representation, and compare behavior cloning and diffusion policy for the visual policy training. Experiments both in simulation and on the real robot demonstrate that ViViDex outperforms state-of-the-art approaches on three dexterous manipulation tasks.
Related papers
- OKAMI: Teaching Humanoid Robots Manipulation Skills through Single Video Imitation [35.97702591413093]
We introduce OKAMI, a method that generates a manipulation plan from a single RGB-D video.
OKAMI uses open-world vision models to identify task-relevant objects and retarget the body motions and hand poses separately.
arXiv Detail & Related papers (2024-10-15T17:17:54Z) - View-Invariant Policy Learning via Zero-Shot Novel View Synthesis [26.231630397802785]
We investigate how knowledge from large-scale visual data of the world may be used to address one axis of variation for generalizable manipulation: observational viewpoint.
We study single-image novel view synthesis models, which learn 3D-aware scene-level priors by rendering images of the same scene from alternate camera viewpoints.
For practical application to diverse robotic data, these models must operate zero-shot, performing view synthesis on unseen tasks and environments.
arXiv Detail & Related papers (2024-09-05T16:39:21Z) - Vision-based Manipulation from Single Human Video with Open-World Object Graphs [58.23098483464538]
We present an object-centric approach to empower robots to learn vision-based manipulation skills from human videos.
We introduce ORION, an algorithm that tackles the problem by extracting an object-centric manipulation plan from a single RGB-D video.
arXiv Detail & Related papers (2024-05-30T17:56:54Z) - Learning to Act from Actionless Videos through Dense Correspondences [87.1243107115642]
We present an approach to construct a video-based robot policy capable of reliably executing diverse tasks across different robots and environments.
Our method leverages images as a task-agnostic representation, encoding both the state and action information, and text as a general representation for specifying robot goals.
We demonstrate the efficacy of our approach in learning policies on table-top manipulation and navigation tasks.
arXiv Detail & Related papers (2023-10-12T17:59:23Z) - DeXtreme: Transfer of Agile In-hand Manipulation from Simulation to
Reality [64.51295032956118]
We train a policy that can perform robust dexterous manipulation on an anthropomorphic robot hand.
Our work reaffirms the possibilities of sim-to-real transfer for dexterous manipulation in diverse kinds of hardware and simulator setups.
arXiv Detail & Related papers (2022-10-25T01:51:36Z) - Distilling Motion Planner Augmented Policies into Visual Control
Policies for Robot Manipulation [26.47544415550067]
We propose to distill a state-based motion planner augmented policy to a visual control policy.
We evaluate our method on three manipulation tasks in obstructed environments.
Our framework is highly sample-efficient and outperforms the state-of-the-art algorithms.
arXiv Detail & Related papers (2021-11-11T18:52:00Z) - Learning Object Manipulation Skills via Approximate State Estimation
from Real Videos [47.958512470724926]
Humans are adept at learning new tasks by watching a few instructional videos.
On the other hand, robots that learn new actions either require a lot of effort through trial and error, or use expert demonstrations that are challenging to obtain.
In this paper, we explore a method that facilitates learning object manipulation skills directly from videos.
arXiv Detail & Related papers (2020-11-13T08:53:47Z) - Learning Dexterous Grasping with Object-Centric Visual Affordances [86.49357517864937]
Dexterous robotic hands are appealing for their agility and human-like morphology.
We introduce an approach for learning dexterous grasping.
Our key idea is to embed an object-centric visual affordance model within a deep reinforcement learning loop.
arXiv Detail & Related papers (2020-09-03T04:00:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.