Related papers: Egocentric Hand-object Interaction Detection

Egocentric Hand-object Interaction Detection

URL: http://arxiv.org/abs/2211.09067v1
Date: Wed, 16 Nov 2022 17:31:40 GMT
Title: Egocentric Hand-object Interaction Detection
Authors: Yao Lu, Yanan Liu
Abstract summary: We use a multi-cam system to capture hand pose data from multiple perspectives. Our method can run over $textbf30$ FPS which is much more efficient than Shan's.
Score: 13.639883596251313
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we propose a method to jointly determine the status of hand-object interaction. This is crucial for egocentric human activity understanding and interaction. From a computer vision perspective, we believe that determining whether a hand is interacting with an object depends on whether there is an interactive hand pose and whether the hand is touching the object. Thus, we extract the hand pose, hand-object masks to jointly determine the interaction status. In order to solve the problem of hand pose estimation due to in-hand object occlusion, we use a multi-cam system to capture hand pose data from multiple perspectives. We evaluate and compare our method with the most recent work from Shan et al. \cite{Shan20} on selected images from EPIC-KITCHENS \cite{damen2018scaling} dataset and achieve $89\%$ accuracy on HOI (hand-object interaction) detection which is comparative to Shan's ($92\%$). However, for real-time performance, our method can run over $\textbf{30}$ FPS which is much more efficient than Shan's ($\textbf{1}\sim\textbf{2}$ FPS). A demo can be found from https://www.youtube.com/watch?v=XVj3zBuynmQ

Related papers

UniHOPE: A Unified Approach for Hand-Only and Hand-Object Pose Estimation [82.93208597526503]
Existing methods are specialized, focusing on either bare-hand or hand interacting with object.<n>No method can flexibly handle both scenarios and their performance degrades when applied to the other scenario.<n>We propose UniHOPE, a unified approach for general 3D hand-object pose estimation.
arXiv Detail & Related papers (2025-03-17T15:46:43Z)
Neural feels with neural fields: Visuo-tactile perception for in-hand manipulation [57.60490773016364]
We combine vision and touch sensing on a multi-fingered hand to estimate an object's pose and shape during in-hand manipulation. Our method, NeuralFeels, encodes object geometry by learning a neural field online and jointly tracks it by optimizing a pose graph problem. Our results demonstrate that touch, at the very least, refines and, at the very best, disambiguates visual estimates during in-hand manipulation.
arXiv Detail & Related papers (2023-12-20T22:36:37Z)
AffordPose: A Large-scale Dataset of Hand-Object Interactions with Affordance-driven Hand Pose [16.65196181081623]
We present AffordPose, a large-scale dataset of hand-object interactions with affordance-driven hand pose. We collect a total of 26.7K hand-object interactions, each including the 3D object shape, the part-level affordance label, and the manually adjusted hand poses. The comprehensive data analysis shows the common characteristics and diversity of hand-object interactions per affordance.
arXiv Detail & Related papers (2023-09-16T10:25:28Z)
Interacting Hand-Object Pose Estimation via Dense Mutual Attention [97.26400229871888]
3D hand-object pose estimation is the key to the success of many computer vision applications. We propose a novel dense mutual attention mechanism that is able to model fine-grained dependencies between the hand and the object. Our method is able to produce physically plausible poses with high quality and real-time inference speed.
arXiv Detail & Related papers (2022-11-16T10:01:33Z)
3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal [85.30756038989057]
Estimating 3D interacting hand pose from a single RGB image is essential for understanding human actions. We propose to decompose the challenging interacting hand pose estimation task and estimate the pose of each hand separately. Experiments show that the proposed method significantly outperforms previous state-of-the-art interacting hand pose estimation approaches.
arXiv Detail & Related papers (2022-07-22T13:04:06Z)
Egocentric Hand-object Interaction Detection and Application [24.68535915849555]
We present a method to detect the hand-object interaction from an egocentric perspective. We train networks predicting hand pose, hand mask and in-hand object mask to jointly predict the hand-object interaction status. Our method can run over $textbf30$ FPS which is much efficient than Shan's ($textbf1simtextbf2$ FPS)
arXiv Detail & Related papers (2021-09-29T21:47:16Z)
Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-pixel Part Segmentation [84.28064034301445]
Self-similarity, and the resulting ambiguities in assigning pixel observations to the respective hands, is a major cause of the final 3D pose error. We propose DIGIT, a novel method for estimating the 3D poses of two interacting hands from a single monocular image. We experimentally show that the proposed approach achieves new state-of-the-art performance on the InterHand2.6M dataset.
arXiv Detail & Related papers (2021-07-01T13:28:02Z)
H2O: Two Hands Manipulating Objects for First Person Interaction Recognition [70.46638409156772]
We present a comprehensive framework for egocentric interaction recognition using markerless 3D annotations of two hands manipulating objects. Our method produces annotations of the 3D pose of two hands and the 6D pose of the manipulated objects, along with their interaction labels for each frame. Our dataset, called H2O (2 Hands and Objects), provides synchronized multi-view RGB-D images, interaction labels, object classes, ground-truth 3D poses for left & right hands, 6D object poses, ground-truth camera poses, object meshes and scene point clouds.
arXiv Detail & Related papers (2021-04-22T17:10:42Z)
InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image [71.17227941339935]
We propose a large-scale dataset, InterHand2.6M, and a network, InterNet, for 3D interacting hand pose estimation from a single RGB image. In our experiments, we demonstrate big gains in 3D interacting hand pose estimation accuracy when leveraging the interacting hand data in InterHand2.6M. We also report the accuracy of InterNet on InterHand2.6M, which serves as a strong baseline for this new dataset.
arXiv Detail & Related papers (2020-08-21T05:15:58Z)
Robust, Occlusion-aware Pose Estimation for Objects Grasped by Adaptive Hands [16.343365158924183]
manipulation tasks, such as within-hand manipulation, require the object's pose relative to a robot hand. This paper presents a depth-based framework, which aims for robust pose estimation and short response times.
arXiv Detail & Related papers (2020-03-07T05:51:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.