Egocentric View Hand Action Recognition by Leveraging Hand Surface and
Hand Grasp Type
- URL: http://arxiv.org/abs/2109.03783v1
- Date: Wed, 8 Sep 2021 17:12:02 GMT
- Title: Egocentric View Hand Action Recognition by Leveraging Hand Surface and
Hand Grasp Type
- Authors: Sangpil Kim, Jihyun Bae, Hyunggun Chi, Sunghee Hong, Byoung Soo Koh,
Karthik Ramani
- Abstract summary: The framework synthesizes the mean curvature of the hand mesh model to encode the hand surface geometry in 3D space.
Using hand grasp type and mean curvature of hand increases the performance of the hand action recognition.
- Score: 15.878905144552204
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We introduce a multi-stage framework that uses mean curvature on a hand
surface and focuses on learning interaction between hand and object by
analyzing hand grasp type for hand action recognition in egocentric videos. The
proposed method does not require 3D information of objects including 6D object
poses which are difficult to annotate for learning an object's behavior while
it interacts with hands. Instead, the framework synthesizes the mean curvature
of the hand mesh model to encode the hand surface geometry in 3D space.
Additionally, our method learns the hand grasp type which is highly correlated
with the hand action. From our experiment, we notice that using hand grasp type
and mean curvature of hand increases the performance of the hand action
recognition.
Related papers
- DiffH2O: Diffusion-Based Synthesis of Hand-Object Interactions from Textual Descriptions [15.417836855005087]
We propose DiffH2O, a novel method to synthesize realistic, one or two-handed object interactions.
We decompose the task into a grasping stage and a text-based interaction stage.
In the grasping stage, the model only generates hand motions, whereas in the interaction phase both hand and object poses are synthesized.
arXiv Detail & Related papers (2024-03-26T16:06:42Z) - Hand-Centric Motion Refinement for 3D Hand-Object Interaction via
Hierarchical Spatial-Temporal Modeling [18.128376292350836]
We propose a data-driven method for coarse hand motion refinement.
First, we design a hand-centric representation to describe the dynamic spatial-temporal relation between hands and objects.
Second, to capture the dynamic clues of hand-object interaction, we propose a new architecture.
arXiv Detail & Related papers (2024-01-29T09:17:51Z) - AffordPose: A Large-scale Dataset of Hand-Object Interactions with
Affordance-driven Hand Pose [16.65196181081623]
We present AffordPose, a large-scale dataset of hand-object interactions with affordance-driven hand pose.
We collect a total of 26.7K hand-object interactions, each including the 3D object shape, the part-level affordance label, and the manually adjusted hand poses.
The comprehensive data analysis shows the common characteristics and diversity of hand-object interactions per affordance.
arXiv Detail & Related papers (2023-09-16T10:25:28Z) - GRIP: Generating Interaction Poses Using Spatial Cues and Latent Consistency [57.9920824261925]
Hands are dexterous and highly versatile manipulators that are central to how humans interact with objects and their environment.
modeling realistic hand-object interactions is critical for applications in computer graphics, computer vision, and mixed reality.
GRIP is a learning-based method that takes as input the 3D motion of the body and the object, and synthesizes realistic motion for both hands before, during, and after object interaction.
arXiv Detail & Related papers (2023-08-22T17:59:51Z) - LG-Hand: Advancing 3D Hand Pose Estimation with Locally and Globally
Kinematic Knowledge [0.693939291118954]
We propose LG-Hand, a powerful method for 3D hand pose estimation.
We argue that kinematic information plays an important role, contributing to the performance of 3D hand pose estimation.
Our method achieves promising results on the First-Person Hand Action Benchmark dataset.
arXiv Detail & Related papers (2022-11-06T15:26:32Z) - 3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal [85.30756038989057]
Estimating 3D interacting hand pose from a single RGB image is essential for understanding human actions.
We propose to decompose the challenging interacting hand pose estimation task and estimate the pose of each hand separately.
Experiments show that the proposed method significantly outperforms previous state-of-the-art interacting hand pose estimation approaches.
arXiv Detail & Related papers (2022-07-22T13:04:06Z) - Learning to Disambiguate Strongly Interacting Hands via Probabilistic
Per-pixel Part Segmentation [84.28064034301445]
Self-similarity, and the resulting ambiguities in assigning pixel observations to the respective hands, is a major cause of the final 3D pose error.
We propose DIGIT, a novel method for estimating the 3D poses of two interacting hands from a single monocular image.
We experimentally show that the proposed approach achieves new state-of-the-art performance on the InterHand2.6M dataset.
arXiv Detail & Related papers (2021-07-01T13:28:02Z) - H2O: Two Hands Manipulating Objects for First Person Interaction
Recognition [70.46638409156772]
We present a comprehensive framework for egocentric interaction recognition using markerless 3D annotations of two hands manipulating objects.
Our method produces annotations of the 3D pose of two hands and the 6D pose of the manipulated objects, along with their interaction labels for each frame.
Our dataset, called H2O (2 Hands and Objects), provides synchronized multi-view RGB-D images, interaction labels, object classes, ground-truth 3D poses for left & right hands, 6D object poses, ground-truth camera poses, object meshes and scene point clouds.
arXiv Detail & Related papers (2021-04-22T17:10:42Z) - Body2Hands: Learning to Infer 3D Hands from Conversational Gesture Body
Dynamics [87.17505994436308]
We build upon the insight that body motion and hand gestures are strongly correlated in non-verbal communication settings.
We formulate the learning of this prior as a prediction task of 3D hand shape over time given body motion input alone.
Our hand prediction model produces convincing 3D hand gestures given only the 3D motion of the speaker's arms as input.
arXiv Detail & Related papers (2020-07-23T22:58:15Z) - Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and
Objects for 3D Hand Pose Estimation under Hand-Object Interaction [137.28465645405655]
HANDS'19 is a challenge to evaluate the abilities of current 3D hand pose estimators (HPEs) to interpolate and extrapolate the poses of a training set.
We show that the accuracy of state-of-the-art methods can drop, and that they fail mostly on poses absent from the training set.
arXiv Detail & Related papers (2020-03-30T19:28:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.