Related papers: ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos

ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos

URL: http://arxiv.org/abs/2404.15709v1
Date: Wed, 24 Apr 2024 07:58:28 GMT
Title: ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos
Authors: Zerui Chen, Shizhe Chen, Cordelia Schmid, Ivan Laptev,
Abstract summary: We propose a new framework ViViDex to improve vision-based policy learning from human videos. It first uses reinforcement learning with trajectory guided rewards to train state-based policies for each video. We then rollout successful episodes from state-based policies and train a unified visual policy without using any privileged information.
Score: 87.96864712314324
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we aim to learn a unified vision-based policy for a multi-fingered robot hand to manipulate different objects in diverse poses. Though prior work has demonstrated that human videos can benefit policy learning, performance improvement has been limited by physically implausible trajectories extracted from videos. Moreover, reliance on privileged object information such as ground-truth object states further limits the applicability in realistic scenarios. To address these limitations, we propose a new framework ViViDex to improve vision-based policy learning from human videos. It first uses reinforcement learning with trajectory guided rewards to train state-based policies for each video, obtaining both visually natural and physically plausible trajectories from the video. We then rollout successful episodes from state-based policies and train a unified visual policy without using any privileged information. A coordinate transformation method is proposed to significantly boost the performance. We evaluate our method on three dexterous manipulation tasks and demonstrate a large improvement over state-of-the-art algorithms.

Related papers

Zero-Shot Visual Generalization in Robot Manipulation [0.13280779791485384]
Current approaches often sidestep the problem by relying on invariant representations such as point clouds and depth.<n>Disentangled representation learning has recently shown promise in enabling vision-based reinforcement learning policies to be robust to visual distribution shifts.<n>We demonstrate zero-shot adaptability to visual perturbations in both simulation and on real hardware.
arXiv Detail & Related papers (2025-05-16T22:01:46Z)
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations [19.45821593625599]
Video diffusion models (VDMs) have demonstrated the capability to accurately predict future image sequences. We propose the Video Prediction Policy (VPP), a generalist robotic policy conditioned on the predictive visual representations from VDMs. VPP consistently outperforms existing methods across two simulated and two real-world benchmarks.
arXiv Detail & Related papers (2024-12-19T12:48:40Z)
OKAMI: Teaching Humanoid Robots Manipulation Skills through Single Video Imitation [35.97702591413093]
We introduce OKAMI, a method that generates a manipulation plan from a single RGB-D video. OKAMI uses open-world vision models to identify task-relevant objects and retarget the body motions and hand poses separately.
arXiv Detail & Related papers (2024-10-15T17:17:54Z)
View-Invariant Policy Learning via Zero-Shot Novel View Synthesis [26.231630397802785]
We investigate how knowledge from large-scale visual data of the world may be used to address one axis of variation for generalizable manipulation: observational viewpoint. We study single-image novel view synthesis models, which learn 3D-aware scene-level priors by rendering images of the same scene from alternate camera viewpoints. For practical application to diverse robotic data, these models must operate zero-shot, performing view synthesis on unseen tasks and environments.
arXiv Detail & Related papers (2024-09-05T16:39:21Z)
Vision-based Manipulation from Single Human Video with Open-World Object Graphs [58.23098483464538]
We present an object-centric approach to empower robots to learn vision-based manipulation skills from human videos. We introduce ORION, an algorithm that tackles the problem by extracting an object-centric manipulation plan from a single RGB-D video.
arXiv Detail & Related papers (2024-05-30T17:56:54Z)
Learning to Act from Actionless Videos through Dense Correspondences [87.1243107115642]
We present an approach to construct a video-based robot policy capable of reliably executing diverse tasks across different robots and environments. Our method leverages images as a task-agnostic representation, encoding both the state and action information, and text as a general representation for specifying robot goals. We demonstrate the efficacy of our approach in learning policies on table-top manipulation and navigation tasks.
arXiv Detail & Related papers (2023-10-12T17:59:23Z)
DeXtreme: Transfer of Agile In-hand Manipulation from Simulation to Reality [64.51295032956118]
We train a policy that can perform robust dexterous manipulation on an anthropomorphic robot hand. Our work reaffirms the possibilities of sim-to-real transfer for dexterous manipulation in diverse kinds of hardware and simulator setups.
arXiv Detail & Related papers (2022-10-25T01:51:36Z)
Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation [26.47544415550067]
We propose to distill a state-based motion planner augmented policy to a visual control policy. We evaluate our method on three manipulation tasks in obstructed environments. Our framework is highly sample-efficient and outperforms the state-of-the-art algorithms.
arXiv Detail & Related papers (2021-11-11T18:52:00Z)
Learning Object Manipulation Skills via Approximate State Estimation from Real Videos [47.958512470724926]
Humans are adept at learning new tasks by watching a few instructional videos. On the other hand, robots that learn new actions either require a lot of effort through trial and error, or use expert demonstrations that are challenging to obtain. In this paper, we explore a method that facilitates learning object manipulation skills directly from videos.
arXiv Detail & Related papers (2020-11-13T08:53:47Z)
Learning Dexterous Grasping with Object-Centric Visual Affordances [86.49357517864937]
Dexterous robotic hands are appealing for their agility and human-like morphology. We introduce an approach for learning dexterous grasping. Our key idea is to embed an object-centric visual affordance model within a deep reinforcement learning loop.
arXiv Detail & Related papers (2020-09-03T04:00:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.