Learning Continuous Grasping Function with a Dexterous Hand from Human
Demonstrations
- URL: http://arxiv.org/abs/2207.05053v3
- Date: Sun, 19 Mar 2023 05:12:15 GMT
- Title: Learning Continuous Grasping Function with a Dexterous Hand from Human
Demonstrations
- Authors: Jianglong Ye, Jiashun Wang, Binghao Huang, Yuzhe Qin, Xiaolong Wang
- Abstract summary: We name the proposed model Continuous Grasping Function (CGF)
CGF is learned via generative modeling with a Variational Autoencoder using 3D human demonstrations.
Compared to previous planning algorithms, CGF is more efficient and achieves significant improvement on success rate when transferred to grasping with the real Allegro Hand.
- Score: 7.733935820533302
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose to learn to generate grasping motion for manipulation with a
dexterous hand using implicit functions. With continuous time inputs, the model
can generate a continuous and smooth grasping plan. We name the proposed model
Continuous Grasping Function (CGF). CGF is learned via generative modeling with
a Conditional Variational Autoencoder using 3D human demonstrations. We will
first convert the large-scale human-object interaction trajectories to robot
demonstrations via motion retargeting, and then use these demonstrations to
train CGF. During inference, we perform sampling with CGF to generate different
grasping plans in the simulator and select the successful ones to transfer to
the real robot. By training on diverse human data, our CGF allows
generalization to manipulate multiple objects. Compared to previous planning
algorithms, CGF is more efficient and achieves significant improvement on
success rate when transferred to grasping with the real Allegro Hand. Our
project page is available at https://jianglongye.com/cgf .
Related papers
- GAF: Gaussian Action Field as a Dynamic World Model for Robotic Manipulation [35.25620666966874]
Existing approaches typically follow either a Vision-to-Action (V-A) paradigm, predicting actions directly from visual inputs, or a Vision-to-3D-to-Action (V-3D-A) paradigm, leveraging intermediate 3D representations.<n>We propose a Vision-to-4D-to-Action framework that enables direct action reasoning from motion-aware 4D representations via a Gaussian Action Field (GAF)<n>Experiments demonstrate significant improvements, with achieving +11.5385 dB PSNR and -0.5574 LPIPS improvements in reconstruction quality, while boosting the average success rate in robotic manipulation tasks by 10.33%
arXiv Detail & Related papers (2025-06-17T02:55:20Z) - ManiTrend: Bridging Future Generation and Action Prediction with 3D Flow for Robotic Manipulation [11.233768932957771]
3D flow represents the motion trend of 3D particles within a scene.
ManiTrend is a unified framework that models the dynamics of 3D particles, vision observations and manipulation actions.
Our method achieves state-of-the-art performance with high efficiency.
arXiv Detail & Related papers (2025-02-14T09:13:57Z) - G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation [65.86819811007157]
We present G3Flow, a novel framework that constructs real-time semantic flow, a dynamic, object-centric 3D representation by leveraging foundation models.
Our approach uniquely combines 3D generative models for digital twin creation, vision foundation models for semantic feature extraction, and robust pose tracking for continuous semantic flow updates.
Our results demonstrate the effectiveness of G3Flow in enhancing real-time dynamic semantic feature understanding for robotic manipulation policies.
arXiv Detail & Related papers (2024-11-27T14:17:43Z) - Hand-Object Interaction Pretraining from Videos [77.92637809322231]
We learn general robot manipulation priors from 3D hand-object interaction trajectories.
We do so by sharing both the human hand and the manipulated object in 3D space and human motions to robot actions.
We empirically demonstrate that finetuning this policy, with both reinforcement learning (RL) and behavior cloning (BC), enables sample-efficient adaptation to downstream tasks and simultaneously improves robustness and generalizability compared to prior approaches.
arXiv Detail & Related papers (2024-09-12T17:59:07Z) - Text-guided 3D Human Motion Generation with Keyframe-based Parallel Skip Transformer [62.29951737214263]
Existing algorithms directly generate the full sequence which is expensive and prone to errors.
We propose KeyMotion, that generates plausible human motion sequences corresponding to input text.
We use a Variationalcoder (VAE) with Kullback-Leibler regularization to project the Autoencoder into a latent space.
For the reverse diffusion, we propose a novel Parallel Skip Transformer that performs cross-modal attention between the design latents and text condition.
arXiv Detail & Related papers (2024-05-24T11:12:37Z) - DiffGen: Robot Demonstration Generation via Differentiable Physics Simulation, Differentiable Rendering, and Vision-Language Model [72.66465487508556]
DiffGen is a novel framework that integrates differentiable physics simulation, differentiable rendering, and a vision-language model.
It can generate realistic robot demonstrations by minimizing the distance between the embedding of the language instruction and the embedding of the simulated observation.
Experiments demonstrate that with DiffGen, we could efficiently and effectively generate robot data with minimal human effort or training time.
arXiv Detail & Related papers (2024-05-12T15:38:17Z) - Hierarchical Generation of Human-Object Interactions with Diffusion
Probabilistic Models [71.64318025625833]
This paper presents a novel approach to generating the 3D motion of a human interacting with a target object.
Our framework first generates a set of milestones and then synthesizes the motion along them.
The experiments on the NSM, COUCH, and SAMP datasets show that our approach outperforms previous methods by a large margin in both quality and diversity.
arXiv Detail & Related papers (2023-10-03T17:50:23Z) - DMFC-GraspNet: Differentiable Multi-Fingered Robotic Grasp Generation in
Cluttered Scenes [22.835683657191936]
Multi-fingered robotic grasping can potentially perform complex object manipulation.
Current techniques for multi-fingered robotic grasping frequently predict only a single grasp for each inference time.
This paper proposes a differentiable multi-fingered grasp generation network (DMFC-GraspNet) with three main contributions to address this challenge.
arXiv Detail & Related papers (2023-08-01T11:21:07Z) - TransFusion: A Practical and Effective Transformer-based Diffusion Model
for 3D Human Motion Prediction [1.8923948104852863]
We propose TransFusion, an innovative and practical diffusion-based model for 3D human motion prediction.
Our model leverages Transformer as the backbone with long skip connections between shallow and deep layers.
In contrast to prior diffusion-based models that utilize extra modules like cross-attention and adaptive layer normalization, we treat all inputs, including conditions, as tokens to create a more lightweight model.
arXiv Detail & Related papers (2023-07-30T01:52:07Z) - VoxPoser: Composable 3D Value Maps for Robotic Manipulation with
Language Models [38.503337052122234]
Large language models (LLMs) are shown to possess a wealth of actionable knowledge that can be extracted for robot manipulation.
We aim to synthesize robot trajectories for a variety of manipulation tasks given an open-set of instructions and an open-set of objects.
We demonstrate how the proposed framework can benefit from online experiences by efficiently learning a dynamics model for scenes that involve contact-rich interactions.
arXiv Detail & Related papers (2023-07-12T07:40:48Z) - Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF)
It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model.
We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z) - STAR: Sparse Transformer-based Action Recognition [61.490243467748314]
This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data.
Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference.
arXiv Detail & Related papers (2021-07-15T02:53:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.