Egocentric Hand-object Interaction Detection
- URL: http://arxiv.org/abs/2211.09067v1
- Date: Wed, 16 Nov 2022 17:31:40 GMT
- Title: Egocentric Hand-object Interaction Detection
- Authors: Yao Lu, Yanan Liu
- Abstract summary: We use a multi-cam system to capture hand pose data from multiple perspectives.
Our method can run over $textbf30$ FPS which is much more efficient than Shan's.
- Score: 13.639883596251313
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose a method to jointly determine the status of
hand-object interaction. This is crucial for egocentric human activity
understanding and interaction. From a computer vision perspective, we believe
that determining whether a hand is interacting with an object depends on
whether there is an interactive hand pose and whether the hand is touching the
object. Thus, we extract the hand pose, hand-object masks to jointly determine
the interaction status. In order to solve the problem of hand pose estimation
due to in-hand object occlusion, we use a multi-cam system to capture hand pose
data from multiple perspectives. We evaluate and compare our method with the
most recent work from Shan et al. \cite{Shan20} on selected images from
EPIC-KITCHENS \cite{damen2018scaling} dataset and achieve $89\%$ accuracy on
HOI (hand-object interaction) detection which is comparative to Shan's
($92\%$). However, for real-time performance, our method can run over
$\textbf{30}$ FPS which is much more efficient than Shan's
($\textbf{1}\sim\textbf{2}$ FPS). A demo can be found from
https://www.youtube.com/watch?v=XVj3zBuynmQ
Related papers
- UniHOPE: A Unified Approach for Hand-Only and Hand-Object Pose Estimation [82.93208597526503]
Existing methods are specialized, focusing on either bare-hand or hand interacting with object.<n>No method can flexibly handle both scenarios and their performance degrades when applied to the other scenario.<n>We propose UniHOPE, a unified approach for general 3D hand-object pose estimation.
arXiv Detail & Related papers (2025-03-17T15:46:43Z) - Neural feels with neural fields: Visuo-tactile perception for in-hand
manipulation [57.60490773016364]
We combine vision and touch sensing on a multi-fingered hand to estimate an object's pose and shape during in-hand manipulation.
Our method, NeuralFeels, encodes object geometry by learning a neural field online and jointly tracks it by optimizing a pose graph problem.
Our results demonstrate that touch, at the very least, refines and, at the very best, disambiguates visual estimates during in-hand manipulation.
arXiv Detail & Related papers (2023-12-20T22:36:37Z) - AffordPose: A Large-scale Dataset of Hand-Object Interactions with
Affordance-driven Hand Pose [16.65196181081623]
We present AffordPose, a large-scale dataset of hand-object interactions with affordance-driven hand pose.
We collect a total of 26.7K hand-object interactions, each including the 3D object shape, the part-level affordance label, and the manually adjusted hand poses.
The comprehensive data analysis shows the common characteristics and diversity of hand-object interactions per affordance.
arXiv Detail & Related papers (2023-09-16T10:25:28Z) - Interacting Hand-Object Pose Estimation via Dense Mutual Attention [97.26400229871888]
3D hand-object pose estimation is the key to the success of many computer vision applications.
We propose a novel dense mutual attention mechanism that is able to model fine-grained dependencies between the hand and the object.
Our method is able to produce physically plausible poses with high quality and real-time inference speed.
arXiv Detail & Related papers (2022-11-16T10:01:33Z) - 3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal [85.30756038989057]
Estimating 3D interacting hand pose from a single RGB image is essential for understanding human actions.
We propose to decompose the challenging interacting hand pose estimation task and estimate the pose of each hand separately.
Experiments show that the proposed method significantly outperforms previous state-of-the-art interacting hand pose estimation approaches.
arXiv Detail & Related papers (2022-07-22T13:04:06Z) - Egocentric Hand-object Interaction Detection and Application [24.68535915849555]
We present a method to detect the hand-object interaction from an egocentric perspective.
We train networks predicting hand pose, hand mask and in-hand object mask to jointly predict the hand-object interaction status.
Our method can run over $textbf30$ FPS which is much efficient than Shan's ($textbf1simtextbf2$ FPS)
arXiv Detail & Related papers (2021-09-29T21:47:16Z) - Learning to Disambiguate Strongly Interacting Hands via Probabilistic
Per-pixel Part Segmentation [84.28064034301445]
Self-similarity, and the resulting ambiguities in assigning pixel observations to the respective hands, is a major cause of the final 3D pose error.
We propose DIGIT, a novel method for estimating the 3D poses of two interacting hands from a single monocular image.
We experimentally show that the proposed approach achieves new state-of-the-art performance on the InterHand2.6M dataset.
arXiv Detail & Related papers (2021-07-01T13:28:02Z) - H2O: Two Hands Manipulating Objects for First Person Interaction
Recognition [70.46638409156772]
We present a comprehensive framework for egocentric interaction recognition using markerless 3D annotations of two hands manipulating objects.
Our method produces annotations of the 3D pose of two hands and the 6D pose of the manipulated objects, along with their interaction labels for each frame.
Our dataset, called H2O (2 Hands and Objects), provides synchronized multi-view RGB-D images, interaction labels, object classes, ground-truth 3D poses for left & right hands, 6D object poses, ground-truth camera poses, object meshes and scene point clouds.
arXiv Detail & Related papers (2021-04-22T17:10:42Z) - InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose
Estimation from a Single RGB Image [71.17227941339935]
We propose a large-scale dataset, InterHand2.6M, and a network, InterNet, for 3D interacting hand pose estimation from a single RGB image.
In our experiments, we demonstrate big gains in 3D interacting hand pose estimation accuracy when leveraging the interacting hand data in InterHand2.6M.
We also report the accuracy of InterNet on InterHand2.6M, which serves as a strong baseline for this new dataset.
arXiv Detail & Related papers (2020-08-21T05:15:58Z) - Robust, Occlusion-aware Pose Estimation for Objects Grasped by Adaptive
Hands [16.343365158924183]
manipulation tasks, such as within-hand manipulation, require the object's pose relative to a robot hand.
This paper presents a depth-based framework, which aims for robust pose estimation and short response times.
arXiv Detail & Related papers (2020-03-07T05:51:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.