Related papers: 4DHands: Reconstructing Interactive Hands in 4D with Transformers

4DHands: Reconstructing Interactive Hands in 4D with Transformers

URL: http://arxiv.org/abs/2405.20330v2
Date: Fri, 31 May 2024 10:52:56 GMT
Title: 4DHands: Reconstructing Interactive Hands in 4D with Transformers
Authors: Dixuan Lin, Yuxiang Zhang, Mengcheng Li, Yebin Liu, Wei Jing, Qi Yan, Qianying Wang, Hongwen Zhang,
Abstract summary: We introduce 4DHands, a robust approach to recovering interactive hand meshes and their relative movement from monocular inputs. We develop a transformer-based architecture with novel tokenization and feature fusion strategies. The efficacy of our approach is validated on several benchmark datasets.
Score: 35.983309206845036
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we introduce 4DHands, a robust approach to recovering interactive hand meshes and their relative movement from monocular inputs. Our approach addresses two major limitations of previous methods: lacking a unified solution for handling various hand image inputs and neglecting the positional relationship of two hands within images. To overcome these challenges, we develop a transformer-based architecture with novel tokenization and feature fusion strategies. Specifically, we propose a Relation-aware Two-Hand Tokenization (RAT) method to embed positional relation information into the hand tokens. In this way, our network can handle both single-hand and two-hand inputs and explicitly leverage relative hand positions, facilitating the reconstruction of intricate hand interactions in real-world scenarios. As such tokenization indicates the relative relationship of two hands, it also supports more effective feature fusion. To this end, we further develop a Spatio-temporal Interaction Reasoning (SIR) module to fuse hand tokens in 4D with attention and decode them into 3D hand meshes and relative temporal movements. The efficacy of our approach is validated on several benchmark datasets. The results on in-the-wild videos and real-world scenarios demonstrate the superior performances of our approach for interactive hand reconstruction. More video results can be found on the project page: https://4dhands.github.io.

Related papers

BimArt: A Unified Approach for the Synthesis of 3D Bimanual Interaction with Articulated Objects [70.20706475051347]
BimArt is a novel generative approach for synthesizing 3D bimanual hand interactions with articulated objects. We first generate distance-based contact maps conditioned on the object trajectory with an articulation-aware feature representation. The learned contact prior is then used to guide our hand motion generator, producing diverse and realistic bimanual motions for object movement and articulation.
arXiv Detail & Related papers (2024-12-06T14:23:56Z)
ManiDext: Hand-Object Manipulation Synthesis via Continuous Correspondence Embeddings and Residual-Guided Diffusion [36.9457697304841]
ManiDext is a unified hierarchical diffusion-based framework for generating hand manipulation and grasp poses. Our key insight is that accurately modeling the contact correspondences between objects and hands during interactions is crucial. Our framework first generates contact maps and correspondence embeddings on the object's surface. Based on these fine-grained correspondences, we introduce a novel approach that integrates the iterative refinement process into the diffusion process.
arXiv Detail & Related papers (2024-09-14T04:28:44Z)
DiffH2O: Diffusion-Based Synthesis of Hand-Object Interactions from Textual Descriptions [15.417836855005087]
We propose DiffH2O, a novel method to synthesize realistic, one or two-handed object interactions. We decompose the task into a grasping stage and a text-based interaction stage. In the grasping stage, the model only generates hand motions, whereas in the interaction phase both hand and object poses are synthesized.
arXiv Detail & Related papers (2024-03-26T16:06:42Z)
HandDiffuse: Generative Controllers for Two-Hand Interactions via Diffusion Models [48.56319454887096]
Existing hands datasets are largely short-range and the interaction is weak due to the self-occlusion and self-similarity of hands. To rescue the data scarcity, we propose HandDiffuse12.5M, a novel dataset that consists of temporal sequences with strong two-hand interactions.
arXiv Detail & Related papers (2023-12-08T07:07:13Z)
Novel-view Synthesis and Pose Estimation for Hand-Object Interaction from Sparse Views [41.50710846018882]
We propose a neural rendering and pose estimation system for hand-object interaction from sparse views. We first learn the shape and appearance prior knowledge of hands and objects separately with the neural representation. During the online stage, we design a rendering-based joint model fitting framework to understand the dynamic hand-object interaction.
arXiv Detail & Related papers (2023-08-22T05:17:41Z)
ACR: Attention Collaboration-based Regressor for Arbitrary Two-Hand Reconstruction [30.073586754012645]
We present ACR (Attention Collaboration-based Regressor), which makes the first attempt to reconstruct hands in arbitrary scenarios. We evaluate our method on various types of hand reconstruction datasets.
arXiv Detail & Related papers (2023-03-10T14:19:02Z)
Deformer: Dynamic Fusion Transformer for Robust Hand Pose Estimation [59.3035531612715]
Existing methods often struggle to generate plausible hand poses when the hand is heavily occluded or blurred. In videos, the movements of the hand allow us to observe various parts of the hand that may be occluded or blurred in a single frame. We propose the Deformer: a framework that implicitly reasons about the relationship between hand parts within the same image.
arXiv Detail & Related papers (2023-03-09T02:24:30Z)
Im2Hands: Learning Attentive Implicit Representation of Interacting Two-Hand Shapes [58.551154822792284]
Implicit Two Hands (Im2Hands) is the first neural implicit representation of two interacting hands. Im2Hands can produce fine-grained geometry of two hands with high hand-to-hand and hand-to-image coherency. We experimentally demonstrate the effectiveness of Im2Hands on two-hand reconstruction in comparison to related methods.
arXiv Detail & Related papers (2023-02-28T06:38:25Z)
Decoupled Iterative Refinement Framework for Interacting Hands Reconstruction from a Single RGB Image [30.24438569170251]
We propose a decoupled iterative refinement framework to achieve pixel-alignment hand reconstruction. Our method outperforms all existing two-hand reconstruction methods by a large margin on the InterHand2.6M dataset.
arXiv Detail & Related papers (2023-02-05T15:46:57Z)
Collaborative Learning for Hand and Object Reconstruction with Attention-guided Graph Convolution [49.10497573378427]
Estimating the pose and shape of hands and objects under interaction finds numerous applications including augmented and virtual reality. Our algorithm is optimisation to object models, and it learns the physical rules governing hand-object interaction. Experiments using four widely-used benchmarks show that our framework achieves beyond state-of-the-art accuracy in 3D pose estimation, as well as recovers dense 3D hand and object shapes.
arXiv Detail & Related papers (2022-04-27T17:00:54Z)
Joint Hand-object 3D Reconstruction from a Single Image with Cross-branch Feature Fusion [78.98074380040838]
We propose to consider hand and object jointly in feature space and explore the reciprocity of the two branches. We employ an auxiliary depth estimation module to augment the input RGB image with the estimated depth map. Our approach significantly outperforms existing approaches in terms of the reconstruction accuracy of objects.
arXiv Detail & Related papers (2020-06-28T09:50:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.