UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse
Proposal Generation and Goal-Conditioned Policy
- URL: http://arxiv.org/abs/2303.00938v2
- Date: Sat, 25 Mar 2023 07:35:32 GMT
- Title: UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse
Proposal Generation and Goal-Conditioned Policy
- Authors: Yinzhen Xu, Weikang Wan, Jialiang Zhang, Haoran Liu, Zikang Shan, Hao
Shen, Ruicheng Wang, Haoran Geng, Yijia Weng, Jiayi Chen, Tengyu Liu, Li Yi,
He Wang
- Abstract summary: We tackle the problem of learning universal robotic dexterous grasping from a point cloud observation under a table-top setting.
Inspired by successful pipelines used in parallel gripper grasping, we split the task into two stages: 1) grasp proposal (pose) generation and 2) goal-conditioned grasp execution.
Our final pipeline becomes the first to achieve universal generalization for dexterous grasping, demonstrating an average success rate of more than 60% on thousands of object instances.
- Score: 23.362000826018612
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we tackle the problem of learning universal robotic dexterous
grasping from a point cloud observation under a table-top setting. The goal is
to grasp and lift up objects in high-quality and diverse ways and generalize
across hundreds of categories and even the unseen. Inspired by successful
pipelines used in parallel gripper grasping, we split the task into two stages:
1) grasp proposal (pose) generation and 2) goal-conditioned grasp execution.
For the first stage, we propose a novel probabilistic model of grasp pose
conditioned on the point cloud observation that factorizes rotation from
translation and articulation. Trained on our synthesized large-scale dexterous
grasp dataset, this model enables us to sample diverse and high-quality
dexterous grasp poses for the object point cloud.For the second stage, we
propose to replace the motion planning used in parallel gripper grasping with a
goal-conditioned grasp policy, due to the complexity involved in dexterous
grasping execution. Note that it is very challenging to learn this highly
generalizable grasp policy that only takes realistic inputs without oracle
states. We thus propose several important innovations, including state
canonicalization, object curriculum, and teacher-student distillation.
Integrating the two stages, our final pipeline becomes the first to achieve
universal generalization for dexterous grasping, demonstrating an average
success rate of more than 60\% on thousands of object instances, which
significantly outperforms all baselines, meanwhile showing only a minimal
generalization gap.
Related papers
- ArticuBot: Learning Universal Articulated Object Manipulation Policy via Large Scale Simulation [22.43711565969091]
Articubot is a system that learns a policy to open diverse categories of unseen articulated objects in the real world.
We show that our learned policy can zero-shot transfer to three different real robot settings.
arXiv Detail & Related papers (2025-03-04T22:51:50Z) - DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping [14.511049253735834]
A general-purpose robot must be capable of grasping diverse objects in arbitrary scenarios.
Our solution is DexGraspVLA, a hierarchical framework that utilizes a pre-trained Vision-Language model as the high-level task planner.
Our method achieves a 90+% success rate under thousands of unseen object, lighting, and background combinations.
arXiv Detail & Related papers (2025-02-28T09:57:20Z) - GHIL-Glue: Hierarchical Control with Filtered Subgoal Images [68.36060286192262]
Generative Hierarchical Imitation Learning-Glue (GHIL-Glue) is an interface to "glue together" language-conditioned image or video prediction models with low-level goal-conditioned policies.
GHIL-Glue filters out subgoals that do not lead to task progress and improves the robustness of goal-conditioned policies to generated subgoals with harmful visual artifacts.
We find in extensive experiments in both simulated and real environments that GHIL-Glue achieves a 25% improvement across several hierarchical models that leverage generative subgoals.
arXiv Detail & Related papers (2024-10-26T00:32:21Z) - Imagination Policy: Using Generative Point Cloud Models for Learning Manipulation Policies [25.760946763103483]
We propose Imagination Policy, a novel multi-task key-frame policy network for solving high-precision pick and place tasks.
Instead of learning actions directly, Imagination Policy generates point clouds to imagine desired states which are then translated to actions using rigid action estimation.
arXiv Detail & Related papers (2024-06-17T17:00:41Z) - Skeleton2vec: A Self-supervised Learning Framework with Contextualized
Target Representations for Skeleton Sequence [56.092059713922744]
We show that using high-level contextualized features as prediction targets can achieve superior performance.
Specifically, we propose Skeleton2vec, a simple and efficient self-supervised 3D action representation learning framework.
Our proposed Skeleton2vec outperforms previous methods and achieves state-of-the-art results.
arXiv Detail & Related papers (2024-01-01T12:08:35Z) - Joint Learning for Scattered Point Cloud Understanding with Hierarchical Self-Distillation [34.26170741722835]
We propose an end-to-end architecture that compensates for and identifies partial point clouds on the fly.
hierarchical self-distillation (HSD) can be applied to arbitrary hierarchy-based point cloud methods.
arXiv Detail & Related papers (2023-12-28T08:51:04Z) - GraspGF: Learning Score-based Grasping Primitive for Human-assisting
Dexterous Grasping [11.63059055320262]
We propose a novel task called human-assisting dexterous grasping.
It aims to train a policy for controlling a robotic hand's fingers to assist users in grasping objects.
arXiv Detail & Related papers (2023-09-12T08:12:32Z) - Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models.
Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning.
Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z) - Point Cloud Upsampling via Cascaded Refinement Network [39.79759035338819]
Upsampling point cloud in a coarse-to-fine manner is a decent solution.
Existing coarse-to-fine upsampling methods require extra training strategies.
In this paper, we propose a simple yet effective cascaded refinement network.
arXiv Detail & Related papers (2022-10-08T07:09:37Z) - Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in
Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments.
To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command.
We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z) - Self-Supervised Arbitrary-Scale Point Clouds Upsampling via Implicit
Neural Representation [79.60988242843437]
We propose a novel approach that achieves self-supervised and magnification-flexible point clouds upsampling simultaneously.
Experimental results demonstrate that our self-supervised learning based scheme achieves competitive or even better performance than supervised learning based state-of-the-art methods.
arXiv Detail & Related papers (2022-04-18T07:18:25Z) - Learning to Shift Attention for Motion Generation [55.61994201686024]
One challenge of motion generation using robot learning from demonstration techniques is that human demonstrations follow a distribution with multiple modes for one task query.
Previous approaches fail to capture all modes or tend to average modes of the demonstrations and thus generate invalid trajectories.
We propose a motion generation model with extrapolation ability to overcome this problem.
arXiv Detail & Related papers (2021-02-24T09:07:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.