FastGrasp: Efficient Grasp Synthesis with Diffusion
- URL: http://arxiv.org/abs/2411.14786v1
- Date: Fri, 22 Nov 2024 08:06:32 GMT
- Title: FastGrasp: Efficient Grasp Synthesis with Diffusion
- Authors: Xiaofei Wu, Tao Liu, Caoji Li, Yuexin Ma, Yujiao Shi, Xuming He,
- Abstract summary: We introduce a novel diffusion-model-based approach that generates the grasping pose in a one-stage manner.
This allows us to significantly improve generation speed and the diversity of generated hand poses.
Our method achieves faster inference, higher diversity, and superior pose quality than state-of-the-art approaches.
- Score: 25.91329341243337
- License:
- Abstract: Effectively modeling the interaction between human hands and objects is challenging due to the complex physical constraints and the requirement for high generation efficiency in applications. Prior approaches often employ computationally intensive two-stage approaches, which first generate an intermediate representation, such as contact maps, followed by an iterative optimization procedure that updates hand meshes to capture the hand-object relation. However, due to the high computation complexity during the optimization stage, such strategies often suffer from low efficiency in inference. To address this limitation, this work introduces a novel diffusion-model-based approach that generates the grasping pose in a one-stage manner. This allows us to significantly improve generation speed and the diversity of generated hand poses. In particular, we develop a Latent Diffusion Model with an Adaptation Module for object-conditioned hand pose generation and a contact-aware loss to enforce the physical constraints between hands and objects. Extensive experiments demonstrate that our method achieves faster inference, higher diversity, and superior pose quality than state-of-the-art approaches. Code is available at \href{https://github.com/wuxiaofei01/FastGrasp}{https://github.com/wuxiaofei01/FastGrasp.}
Related papers
- Diversify, Contextualize, and Adapt: Efficient Entropy Modeling for Neural Image Codec [11.078070771578837]
More efficient backward adaptation-based entropy models have been recently developed.
We argue that their performance has been limited by the simple adoption of the design convention for forward adaptation.
We propose a simple yet effective entropy modeling framework that leverages sufficient contexts for forward adaptation without compromising on bit-rate.
arXiv Detail & Related papers (2024-11-06T04:30:04Z) - Fast constrained sampling in pre-trained diffusion models [77.21486516041391]
Diffusion models have dominated the field of large, generative image models.
We propose an algorithm for fast-constrained sampling in large pre-trained diffusion models.
arXiv Detail & Related papers (2024-10-24T14:52:38Z) - ManiDext: Hand-Object Manipulation Synthesis via Continuous Correspondence Embeddings and Residual-Guided Diffusion [36.9457697304841]
ManiDext is a unified hierarchical diffusion-based framework for generating hand manipulation and grasp poses.
Our key insight is that accurately modeling the contact correspondences between objects and hands during interactions is crucial.
Our framework first generates contact maps and correspondence embeddings on the object's surface.
Based on these fine-grained correspondences, we introduce a novel approach that integrates the iterative refinement process into the diffusion process.
arXiv Detail & Related papers (2024-09-14T04:28:44Z) - Decomposed Vector-Quantized Variational Autoencoder for Human Grasp Generation [27.206656215734295]
We propose a novel Decomposed Vector-Quantized Variational Autoencoder (DVQ-VAE) to generate realistic human grasps.
Part-aware decomposed architecture facilitates more precise management of the interaction between each component of hand and object.
Our model achieved about 14.1% relative improvement in the quality index compared to the state-of-the-art methods in four widely-adopted benchmarks.
arXiv Detail & Related papers (2024-07-19T06:41:16Z) - InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusion [53.90516061351706]
We present InterHandGen, a novel framework that learns the generative prior of two-hand interaction.
For sampling, we combine anti-penetration and synthesis-free guidance to enable plausible generation.
Our method significantly outperforms baseline generative models in terms of plausibility and diversity.
arXiv Detail & Related papers (2024-03-26T06:35:55Z) - Generative Hierarchical Temporal Transformer for Hand Pose and Action Modeling [67.94143911629143]
We propose a generative Transformer VAE architecture to model hand pose and action.
To faithfully model the semantic dependency and different temporal granularity of hand pose and action, we decompose the framework into two cascaded VAE blocks.
Results show that our joint modeling of recognition and prediction improves over isolated solutions.
arXiv Detail & Related papers (2023-11-29T05:28:39Z) - CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster
Image Generation [49.3016007471979]
Large generative diffusion models have revolutionized text-to-image generation and offer immense potential for conditional generation tasks.
However, their widespread adoption is hindered by the high computational cost, which limits their real-time application.
We introduce a novel method dubbed CoDi, that adapts a pre-trained latent diffusion model to accept additional image conditioning inputs.
arXiv Detail & Related papers (2023-10-02T17:59:18Z) - Learning Iterative Robust Transformation Synchronization [71.73273007900717]
We propose to use graph neural networks (GNNs) to learn transformation synchronization.
In this work, we avoid handcrafting robust loss functions, and propose to use graph neural networks (GNNs) to learn transformation synchronization.
arXiv Detail & Related papers (2021-11-01T07:03:14Z) - Real-time Pose and Shape Reconstruction of Two Interacting Hands With a
Single Depth Camera [79.41374930171469]
We present a novel method for real-time pose and shape reconstruction of two strongly interacting hands.
Our approach combines an extensive list of favorable properties, namely it is marker-less.
We show state-of-the-art results in scenes that exceed the complexity level demonstrated by previous work.
arXiv Detail & Related papers (2021-06-15T11:39:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.