Interacting Hand-Object Pose Estimation via Dense Mutual Attention
- URL: http://arxiv.org/abs/2211.08805v1
- Date: Wed, 16 Nov 2022 10:01:33 GMT
- Title: Interacting Hand-Object Pose Estimation via Dense Mutual Attention
- Authors: Rong Wang, Wei Mao, Hongdong Li
- Abstract summary: 3D hand-object pose estimation is the key to the success of many computer vision applications.
We propose a novel dense mutual attention mechanism that is able to model fine-grained dependencies between the hand and the object.
Our method is able to produce physically plausible poses with high quality and real-time inference speed.
- Score: 97.26400229871888
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 3D hand-object pose estimation is the key to the success of many computer
vision applications. The main focus of this task is to effectively model the
interaction between the hand and an object. To this end, existing works either
rely on interaction constraints in a computationally-expensive iterative
optimization, or consider only a sparse correlation between sampled hand and
object keypoints. In contrast, we propose a novel dense mutual attention
mechanism that is able to model fine-grained dependencies between the hand and
the object. Specifically, we first construct the hand and object graphs
according to their mesh structures. For each hand node, we aggregate features
from every object node by the learned attention and vice versa for each object
node. Thanks to such dense mutual attention, our method is able to produce
physically plausible poses with high quality and real-time inference speed.
Extensive quantitative and qualitative experiments on large benchmark datasets
show that our method outperforms state-of-the-art methods. The code is
available at https://github.com/rongakowang/DenseMutualAttention.git.
Related papers
- ORMNet: Object-centric Relationship Modeling for Egocentric Hand-object Segmentation [14.765419467710812]
Egocentric hand-object segmentation (EgoHOS) is a promising new task aiming at segmenting hands and interacting objects in egocentric images.
This paper proposes a novel Object-centric Relationship Modeling Network (ORMNet) to fulfill end-to-end and effective EgoHOS.
arXiv Detail & Related papers (2024-07-08T03:17:10Z) - InterTracker: Discovering and Tracking General Objects Interacting with
Hands in the Wild [40.489171608114574]
Existing methods rely on frame-based detectors to locate interacting objects.
We propose to leverage hand-object interaction to track interactive objects.
Our proposed method outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2023-08-06T09:09:17Z) - Hand-Object Interaction Image Generation [135.87707468156057]
This work is dedicated to a new task, i.e., hand-object interaction image generation.
It aims to conditionally generate the hand-object image under the given hand, object and their interaction status.
This task is challenging and research-worthy in many potential application scenarios, such as AR/VR games and online shopping.
arXiv Detail & Related papers (2022-11-28T18:59:57Z) - Collaborative Learning for Hand and Object Reconstruction with
Attention-guided Graph Convolution [49.10497573378427]
Estimating the pose and shape of hands and objects under interaction finds numerous applications including augmented and virtual reality.
Our algorithm is optimisation to object models, and it learns the physical rules governing hand-object interaction.
Experiments using four widely-used benchmarks show that our framework achieves beyond state-of-the-art accuracy in 3D pose estimation, as well as recovers dense 3D hand and object shapes.
arXiv Detail & Related papers (2022-04-27T17:00:54Z) - What's in your hands? 3D Reconstruction of Generic Objects in Hands [49.12461675219253]
Our work aims to reconstruct hand-held objects given a single RGB image.
In contrast to prior works that typically assume known 3D templates and reduce the problem to 3D pose estimation, our work reconstructs generic hand-held object without knowing their 3D templates.
arXiv Detail & Related papers (2022-04-14T17:59:02Z) - Learning to Disambiguate Strongly Interacting Hands via Probabilistic
Per-pixel Part Segmentation [84.28064034301445]
Self-similarity, and the resulting ambiguities in assigning pixel observations to the respective hands, is a major cause of the final 3D pose error.
We propose DIGIT, a novel method for estimating the 3D poses of two interacting hands from a single monocular image.
We experimentally show that the proposed approach achieves new state-of-the-art performance on the InterHand2.6M dataset.
arXiv Detail & Related papers (2021-07-01T13:28:02Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z) - Robust, Occlusion-aware Pose Estimation for Objects Grasped by Adaptive
Hands [16.343365158924183]
manipulation tasks, such as within-hand manipulation, require the object's pose relative to a robot hand.
This paper presents a depth-based framework, which aims for robust pose estimation and short response times.
arXiv Detail & Related papers (2020-03-07T05:51:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.