Generate, Transfer, Adapt: Learning Functional Dexterous Grasping from a Single Human Demonstration
- URL: http://arxiv.org/abs/2601.05243v1
- Date: Thu, 08 Jan 2026 18:59:30 GMT
- Title: Generate, Transfer, Adapt: Learning Functional Dexterous Grasping from a Single Human Demonstration
- Authors: Xingyi He, Adhitya Polavaram, Yunhao Cao, Om Deshmukh, Tianrui Wang, Xiaowei Zhou, Kuan Fang,
- Abstract summary: We present CorDex, a framework that robustly learns dexterous functional grasps of novel objects from synthetic data.<n>At the core of our approach is a correspondence-based data engine that generates diverse, high-quality training data in simulation.<n>Building on the generated data, we introduce a multimodal prediction network that integrates visual and geometric information.
- Score: 23.251258563998253
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Functional grasping with dexterous robotic hands is a key capability for enabling tool use and complex manipulation, yet progress has been constrained by two persistent bottlenecks: the scarcity of large-scale datasets and the absence of integrated semantic and geometric reasoning in learned models. In this work, we present CorDex, a framework that robustly learns dexterous functional grasps of novel objects from synthetic data generated from just a single human demonstration. At the core of our approach is a correspondence-based data engine that generates diverse, high-quality training data in simulation. Based on the human demonstration, our data engine generates diverse object instances of the same category, transfers the expert grasp to the generated objects through correspondence estimation, and adapts the grasp through optimization. Building on the generated data, we introduce a multimodal prediction network that integrates visual and geometric information. By devising a local-global fusion module and an importance-aware sampling mechanism, we enable robust and computationally efficient prediction of functional dexterous grasps. Through extensive experiments across various object categories, we demonstrate that CorDex generalizes well to unseen object instances and significantly outperforms state-of-the-art baselines.
Related papers
- URDF-Anything: Constructing Articulated Objects with 3D Multimodal Language Model [76.08429266631823]
We propose an end-to-end automatic reconstruction framework based on a 3D multimodal large language model (MLLM)<n>URDF-Anything utilizes an autoregressive prediction framework based on point-cloud and text multimodal input to jointly optimize geometric segmentation and kinematic parameter prediction.<n> Experiments on both simulated and real-world datasets demonstrate that our method significantly outperforms existing approaches.
arXiv Detail & Related papers (2025-11-02T13:45:51Z) - CEDex: Cross-Embodiment Dexterous Grasp Generation at Scale from Human-like Contact Representations [53.37721117405022]
Cross-embodiment dexterous grasp synthesis refers to adaptively generating and optimizing grasps for various robotic hands.<n>We propose CEDex, a novel cross-embodiment dexterous grasp synthesis method at scale.<n>We construct the largest cross-embodiment grasp dataset to date, comprising 500K objects across four types with 20M total grasps.
arXiv Detail & Related papers (2025-09-29T12:08:04Z) - The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning [4.812580392361432]
Well is a large-scale collection of numerical simulations of a wide variety of physical systems.<n>These datasets can be used individually or as part of a broader benchmark suite.<n>We provide a unified PyTorch interface for training and evaluating models.
arXiv Detail & Related papers (2024-11-30T19:42:14Z) - LS-HAR: Language Supervised Human Action Recognition with Salient Fusion, Construction Sites as a Use-Case [7.565275399668322]
We introduce a novel approach to Human Action Recognition (HAR) using language supervision named LS-HAR.<n>We employ learnable prompts for the language model conditioned on the skeleton modality to optimize feature representation.<n>We introduce a new dataset tailored for real-world robotic applications in construction sites, featuring visual, skeleton, and depth data modalities, named VolvoConstAct.
arXiv Detail & Related papers (2024-10-02T19:10:23Z) - Generative Expansion of Small Datasets: An Expansive Graph Approach [13.053285552524052]
We introduce an Expansive Synthesis model generating large-scale, information-rich datasets from minimal samples.
An autoencoder with self-attention layers and optimal transport refines distributional consistency.
Results show comparable performance, demonstrating the model's potential to augment training data effectively.
arXiv Detail & Related papers (2024-06-25T02:59:02Z) - A Unified Simulation Framework for Visual and Behavioral Fidelity in
Crowd Analysis [6.460475042590685]
We present a human crowd simulator, called UniCrowd, and its associated validation pipeline.
We show how the simulator can generate annotated data, suitable for computer vision tasks, in particular for detection and segmentation, as well as the related applications, as crowd counting, human pose estimation, trajectory analysis and prediction, and anomaly detection.
arXiv Detail & Related papers (2023-12-05T09:43:27Z) - Towards Robust Dataset Learning [90.2590325441068]
We propose a principled, tri-level optimization to formulate the robust dataset learning problem.
Under an abstraction model that characterizes robust vs. non-robust features, the proposed method provably learns a robust dataset.
arXiv Detail & Related papers (2022-11-19T17:06:10Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - MetaGraspNet: A Large-Scale Benchmark Dataset for Vision-driven Robotic
Grasping via Physics-based Metaverse Synthesis [78.26022688167133]
We present a large-scale benchmark dataset for vision-driven robotic grasping via physics-based metaverse synthesis.
The proposed dataset contains 100,000 images and 25 different object types.
We also propose a new layout-weighted performance metric alongside the dataset for evaluating object detection and segmentation performance.
arXiv Detail & Related papers (2021-12-29T17:23:24Z) - Multimodal Data Fusion in High-Dimensional Heterogeneous Datasets via
Generative Models [16.436293069942312]
We are interested in learning probabilistic generative models from high-dimensional heterogeneous data in an unsupervised fashion.
We propose a general framework that combines disparate data types through the exponential family of distributions.
The proposed algorithm is presented in detail for the commonly encountered heterogeneous datasets with real-valued (Gaussian) and categorical (multinomial) features.
arXiv Detail & Related papers (2021-08-27T18:10:31Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.