Collaborative Learning for Hand and Object Reconstruction with
Attention-guided Graph Convolution
- URL: http://arxiv.org/abs/2204.13062v1
- Date: Wed, 27 Apr 2022 17:00:54 GMT
- Title: Collaborative Learning for Hand and Object Reconstruction with
Attention-guided Graph Convolution
- Authors: Tze Ho Elden Tse, Kwang In Kim, Ales Leonardis, Hyung Jin Chang
- Abstract summary: Estimating the pose and shape of hands and objects under interaction finds numerous applications including augmented and virtual reality.
Our algorithm is optimisation to object models, and it learns the physical rules governing hand-object interaction.
Experiments using four widely-used benchmarks show that our framework achieves beyond state-of-the-art accuracy in 3D pose estimation, as well as recovers dense 3D hand and object shapes.
- Score: 49.10497573378427
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Estimating the pose and shape of hands and objects under interaction finds
numerous applications including augmented and virtual reality. Existing
approaches for hand and object reconstruction require explicitly defined
physical constraints and known objects, which limits its application domains.
Our algorithm is agnostic to object models, and it learns the physical rules
governing hand-object interaction. This requires automatically inferring the
shapes and physical interaction of hands and (potentially unknown) objects. We
seek to approach this challenging problem by proposing a collaborative learning
strategy where two-branches of deep networks are learning from each other.
Specifically, we transfer hand mesh information to the object branch and vice
versa for the hand branch. The resulting optimisation (training) problem can be
unstable, and we address this via two strategies: (i) attention-guided graph
convolution which helps identify and focus on mutual occlusion and (ii)
unsupervised associative loss which facilitates the transfer of information
between the branches. Experiments using four widely-used benchmarks show that
our framework achieves beyond state-of-the-art accuracy in 3D pose estimation,
as well as recovers dense 3D hand and object shapes. Each technical component
above contributes meaningfully in the ablation study.
Related papers
- Hand-object reconstruction via interaction-aware graph attention mechanism [25.396356108313178]
Estimating the poses of both a hand and an object has become an important area of research.
We propose a graph-based refinement method that incorporates an interaction-aware graph-attention mechanism.
Experiments demonstrate the effectiveness of our proposed method with notable improvements in the realm of physical plausibility.
arXiv Detail & Related papers (2024-09-26T08:23:04Z) - Novel-view Synthesis and Pose Estimation for Hand-Object Interaction
from Sparse Views [41.50710846018882]
We propose a neural rendering and pose estimation system for hand-object interaction from sparse views.
We first learn the shape and appearance prior knowledge of hands and objects separately with the neural representation.
During the online stage, we design a rendering-based joint model fitting framework to understand the dynamic hand-object interaction.
arXiv Detail & Related papers (2023-08-22T05:17:41Z) - Hierarchical Graph Neural Networks for Proprioceptive 6D Pose Estimation
of In-hand Objects [1.8263882169310044]
We introduce a hierarchical graph neural network architecture for combining multimodal (vision and touch) data.
We also introduce a hierarchical message passing operation that flows the information within and across modalities to learn a graph-based object representation.
arXiv Detail & Related papers (2023-06-28T01:18:53Z) - HMDO: Markerless Multi-view Hand Manipulation Capture with Deformable
Objects [8.711239906965893]
HMDO is the first markerless deformable interaction dataset recording interactive motions of the hands and deformable objects.
The proposed method can reconstruct interactive motions of hands and deformable objects with high quality.
arXiv Detail & Related papers (2023-01-18T16:55:15Z) - Interacting Hand-Object Pose Estimation via Dense Mutual Attention [97.26400229871888]
3D hand-object pose estimation is the key to the success of many computer vision applications.
We propose a novel dense mutual attention mechanism that is able to model fine-grained dependencies between the hand and the object.
Our method is able to produce physically plausible poses with high quality and real-time inference speed.
arXiv Detail & Related papers (2022-11-16T10:01:33Z) - S$^2$Contact: Graph-based Network for 3D Hand-Object Contact Estimation
with Semi-Supervised Learning [70.72037296392642]
We propose a novel semi-supervised framework that allows us to learn contact from monocular images.
Specifically, we leverage visual and geometric consistency constraints in large-scale datasets for generating pseudo-labels.
We show benefits from using a contact map that rules hand-object interactions to produce more accurate reconstructions.
arXiv Detail & Related papers (2022-08-01T14:05:23Z) - Towards unconstrained joint hand-object reconstruction from RGB videos [81.97694449736414]
Reconstructing hand-object manipulations holds a great potential for robotics and learning from human demonstrations.
We first propose a learning-free fitting approach for hand-object reconstruction which can seamlessly handle two-hand object interactions.
arXiv Detail & Related papers (2021-08-16T12:26:34Z) - Joint Hand-object 3D Reconstruction from a Single Image with
Cross-branch Feature Fusion [78.98074380040838]
We propose to consider hand and object jointly in feature space and explore the reciprocity of the two branches.
We employ an auxiliary depth estimation module to augment the input RGB image with the estimated depth map.
Our approach significantly outperforms existing approaches in terms of the reconstruction accuracy of objects.
arXiv Detail & Related papers (2020-06-28T09:50:25Z) - Leveraging Photometric Consistency over Time for Sparsely Supervised
Hand-Object Reconstruction [118.21363599332493]
We present a method to leverage photometric consistency across time when annotations are only available for a sparse subset of frames in a video.
Our model is trained end-to-end on color images to jointly reconstruct hands and objects in 3D by inferring their poses.
We achieve state-of-the-art results on 3D hand-object reconstruction benchmarks and demonstrate that our approach allows us to improve the pose estimation accuracy.
arXiv Detail & Related papers (2020-04-28T12:03:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.