Cross-Embodiment Dexterous Hand Articulation Generation via Morphology-Aware Learning
- URL: http://arxiv.org/abs/2510.06068v1
- Date: Tue, 07 Oct 2025 15:57:00 GMT
- Title: Cross-Embodiment Dexterous Hand Articulation Generation via Morphology-Aware Learning
- Authors: Heng Zhang, Kevin Yuchen Ma, Mike Zheng Shou, Weisi Lin, Yan Wu,
- Abstract summary: Existing end-to-end methods require training on large-scale datasets for specific hands.<n>We propose an eigengrasp-based, end-to-end framework for cross-embodiment grasp generation.
- Score: 82.63833405368159
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dexterous grasping with multi-fingered hands remains challenging due to high-dimensional articulations and the cost of optimization-based pipelines. Existing end-to-end methods require training on large-scale datasets for specific hands, limiting their ability to generalize across different embodiments. We propose an eigengrasp-based, end-to-end framework for cross-embodiment grasp generation. From a hand's morphology description, we derive a morphology embedding and an eigengrasp set. Conditioned on these, together with the object point cloud and wrist pose, an amplitude predictor regresses articulation coefficients in a low-dimensional space, which are decoded into full joint articulations. Articulation learning is supervised with a Kinematic-Aware Articulation Loss (KAL) that emphasizes fingertip-relevant motions and injects morphology-specific structure. In simulation on unseen objects across three dexterous hands, our model attains a 91.9% average grasp success rate with less than 0.4 seconds inference per grasp. With few-shot adaptation to an unseen hand, it achieves 85.6% success on unseen objects in simulation, and real-world experiments on this few-shot generalized hand achieve an 87% success rate. The code and additional materials will be made available upon publication on our project website https://connor-zh.github.io/cross_embodiment_dexterous_grasping.
Related papers
- Structural Action Transformer for 3D Dexterous Manipulation [80.07649565189035]
Cross-embodiment skill transfer is a challenge for high-DoF robotic hands.<n>Existing methods, often relying on 2D observations and temporal-centric action representation, struggle to capture 3D spatial relations.<n>This paper proposes a new 3D dexterous manipulation policy that challenges this paradigm by introducing a structural-centric perspective.
arXiv Detail & Related papers (2026-03-04T11:38:12Z) - CEDex: Cross-Embodiment Dexterous Grasp Generation at Scale from Human-like Contact Representations [53.37721117405022]
Cross-embodiment dexterous grasp synthesis refers to adaptively generating and optimizing grasps for various robotic hands.<n>We propose CEDex, a novel cross-embodiment dexterous grasp synthesis method at scale.<n>We construct the largest cross-embodiment grasp dataset to date, comprising 500K objects across four types with 20M total grasps.
arXiv Detail & Related papers (2025-09-29T12:08:04Z) - ARMO: Autoregressive Rigging for Multi-Category Objects [8.030479370619458]
We introduce OmniRig, the first large-scale rigging dataset, comprising 79,499 meshes with detailed skeleton and skinning information.<n>Unlike traditional benchmarks that rely on predefined standard poses, our dataset embraces diverse shape categories, styles, and poses.<n>We propose ARMO, a novel rigging framework that utilizes an autoregressive model to predict both joint positions and connectivity relationships in a unified manner.
arXiv Detail & Related papers (2025-03-26T15:56:48Z) - DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness [38.310226324389596]
A dexterous hand capable of grasping any object is essential for the development of general-purpose embodied robots.<n>We introduce DexGrasp Anything, a method that integrates physical constraints into the training and sampling phases of a diffusion-based generative model.<n>We present a new dexterous grasping dataset containing over 3.4 million diverse grasping poses for more than 15k different objects.
arXiv Detail & Related papers (2025-03-11T10:21:50Z) - MagicArticulate: Make Your 3D Models Articulation-Ready [109.35703811628045]
We present MagicArticulate, an effective framework that automatically transforms static 3D models into articulation-ready assets.<n>Our key contributions are threefold. First, we introduce Articulation-averse benchmark containing over 33k 3D models with high-quality articulation annotations, carefully curated from XL-XL.<n>Extensive experiments demonstrate that MagicArticulate significantly outperforms existing methods across diverse object categories.
arXiv Detail & Related papers (2025-02-17T18:53:27Z) - DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Synthetic Cluttered Scenes [18.95051035812627]
We present a large-scale synthetic benchmark, encompassing 1319 objects, 8270 scenes, and 427 million grasps.
We also propose a novel two-stage grasping method that learns efficiently from data by using a diffusion model that conditions on local geometry.
With the aid of test-time-depth restoration, our method demonstrates zero-shot sim-to-real transfer, attaining 90.7% real-world dexterous grasping success rate in cluttered scenes.
arXiv Detail & Related papers (2024-10-30T13:30:39Z) - ContactDexNet: Multi-fingered Robotic Hand Grasping in Cluttered Environments through Hand-object Contact Semantic Mapping [14.674925349389179]
We develop a method for generating multi-fingered hand grasp samples in cluttered settings through contact semantic map.<n>We also propose the multi-modal multi-fingered grasping dataset generation method.
arXiv Detail & Related papers (2024-04-12T23:11:36Z) - Combining Shape Completion and Grasp Prediction for Fast and Versatile
Grasping with a Multi-Fingered Hand [2.4682909476447588]
We present a novel, fast, and high fidelity deep learning pipeline consisting of a shape completion module and a grasp predictor.
As grasp predictor, we use our two-stage architecture that first generates hand poses using an autoregressive model and then regresses finger joint configurations per pose.
Experiments on a physical robot platform demonstrate successful grasping of a wide range of household objects based on a depth image from a single viewpoint.
arXiv Detail & Related papers (2023-10-31T10:46:19Z) - GAMMA: Generalizable Articulation Modeling and Manipulation for
Articulated Objects [53.965581080954905]
We propose a novel framework of Generalizable Articulation Modeling and Manipulating for Articulated Objects (GAMMA)
GAMMA learns both articulation modeling and grasp pose affordance from diverse articulated objects with different categories.
Results show that GAMMA significantly outperforms SOTA articulation modeling and manipulation algorithms in unseen and cross-category articulated objects.
arXiv Detail & Related papers (2023-09-28T08:57:14Z) - Multiscale Residual Learning of Graph Convolutional Sequence Chunks for
Human Motion Prediction [23.212848643552395]
A new method is proposed for human motion prediction by learning temporal and spatial dependencies.
Our proposed method is able to effectively model the sequence information for motion prediction and outperform other techniques to set a new state-of-the-art.
arXiv Detail & Related papers (2023-08-31T15:23:33Z) - Interacting Hand-Object Pose Estimation via Dense Mutual Attention [97.26400229871888]
3D hand-object pose estimation is the key to the success of many computer vision applications.
We propose a novel dense mutual attention mechanism that is able to model fine-grained dependencies between the hand and the object.
Our method is able to produce physically plausible poses with high quality and real-time inference speed.
arXiv Detail & Related papers (2022-11-16T10:01:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.