Egocentric View Hand Action Recognition by Leveraging Hand Surface and
Hand Grasp Type
- URL: http://arxiv.org/abs/2109.03783v1
- Date: Wed, 8 Sep 2021 17:12:02 GMT
- Title: Egocentric View Hand Action Recognition by Leveraging Hand Surface and
Hand Grasp Type
- Authors: Sangpil Kim, Jihyun Bae, Hyunggun Chi, Sunghee Hong, Byoung Soo Koh,
Karthik Ramani
- Abstract summary: The framework synthesizes the mean curvature of the hand mesh model to encode the hand surface geometry in 3D space.
Using hand grasp type and mean curvature of hand increases the performance of the hand action recognition.
- Score: 15.878905144552204
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We introduce a multi-stage framework that uses mean curvature on a hand
surface and focuses on learning interaction between hand and object by
analyzing hand grasp type for hand action recognition in egocentric videos. The
proposed method does not require 3D information of objects including 6D object
poses which are difficult to annotate for learning an object's behavior while
it interacts with hands. Instead, the framework synthesizes the mean curvature
of the hand mesh model to encode the hand surface geometry in 3D space.
Additionally, our method learns the hand grasp type which is highly correlated
with the hand action. From our experiment, we notice that using hand grasp type
and mean curvature of hand increases the performance of the hand action
recognition.
Related papers
- BimArt: A Unified Approach for the Synthesis of 3D Bimanual Interaction with Articulated Objects [70.20706475051347]
BimArt is a novel generative approach for synthesizing 3D bimanual hand interactions with articulated objects.
We first generate distance-based contact maps conditioned on the object trajectory with an articulation-aware feature representation.
The learned contact prior is then used to guide our hand motion generator, producing diverse and realistic bimanual motions for object movement and articulation.
arXiv Detail & Related papers (2024-12-06T14:23:56Z) - DiffH2O: Diffusion-Based Synthesis of Hand-Object Interactions from Textual Descriptions [15.417836855005087]
We propose a novel method, dubbed DiffH2O, which can synthesize realistic, one or two-handed object interactions.
The method introduces three techniques that enable effective learning from limited data.
arXiv Detail & Related papers (2024-03-26T16:06:42Z) - Hand-Centric Motion Refinement for 3D Hand-Object Interaction via
Hierarchical Spatial-Temporal Modeling [18.128376292350836]
We propose a data-driven method for coarse hand motion refinement.
First, we design a hand-centric representation to describe the dynamic spatial-temporal relation between hands and objects.
Second, to capture the dynamic clues of hand-object interaction, we propose a new architecture.
arXiv Detail & Related papers (2024-01-29T09:17:51Z) - AffordPose: A Large-scale Dataset of Hand-Object Interactions with
Affordance-driven Hand Pose [16.65196181081623]
We present AffordPose, a large-scale dataset of hand-object interactions with affordance-driven hand pose.
We collect a total of 26.7K hand-object interactions, each including the 3D object shape, the part-level affordance label, and the manually adjusted hand poses.
The comprehensive data analysis shows the common characteristics and diversity of hand-object interactions per affordance.
arXiv Detail & Related papers (2023-09-16T10:25:28Z) - GRIP: Generating Interaction Poses Using Spatial Cues and Latent Consistency [57.9920824261925]
Hands are dexterous and highly versatile manipulators that are central to how humans interact with objects and their environment.
modeling realistic hand-object interactions is critical for applications in computer graphics, computer vision, and mixed reality.
GRIP is a learning-based method that takes as input the 3D motion of the body and the object, and synthesizes realistic motion for both hands before, during, and after object interaction.
arXiv Detail & Related papers (2023-08-22T17:59:51Z) - LG-Hand: Advancing 3D Hand Pose Estimation with Locally and Globally
Kinematic Knowledge [0.693939291118954]
We propose LG-Hand, a powerful method for 3D hand pose estimation.
We argue that kinematic information plays an important role, contributing to the performance of 3D hand pose estimation.
Our method achieves promising results on the First-Person Hand Action Benchmark dataset.
arXiv Detail & Related papers (2022-11-06T15:26:32Z) - 3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal [85.30756038989057]
Estimating 3D interacting hand pose from a single RGB image is essential for understanding human actions.
We propose to decompose the challenging interacting hand pose estimation task and estimate the pose of each hand separately.
Experiments show that the proposed method significantly outperforms previous state-of-the-art interacting hand pose estimation approaches.
arXiv Detail & Related papers (2022-07-22T13:04:06Z) - Learning to Disambiguate Strongly Interacting Hands via Probabilistic
Per-pixel Part Segmentation [84.28064034301445]
Self-similarity, and the resulting ambiguities in assigning pixel observations to the respective hands, is a major cause of the final 3D pose error.
We propose DIGIT, a novel method for estimating the 3D poses of two interacting hands from a single monocular image.
We experimentally show that the proposed approach achieves new state-of-the-art performance on the InterHand2.6M dataset.
arXiv Detail & Related papers (2021-07-01T13:28:02Z) - H2O: Two Hands Manipulating Objects for First Person Interaction
Recognition [70.46638409156772]
We present a comprehensive framework for egocentric interaction recognition using markerless 3D annotations of two hands manipulating objects.
Our method produces annotations of the 3D pose of two hands and the 6D pose of the manipulated objects, along with their interaction labels for each frame.
Our dataset, called H2O (2 Hands and Objects), provides synchronized multi-view RGB-D images, interaction labels, object classes, ground-truth 3D poses for left & right hands, 6D object poses, ground-truth camera poses, object meshes and scene point clouds.
arXiv Detail & Related papers (2021-04-22T17:10:42Z) - Body2Hands: Learning to Infer 3D Hands from Conversational Gesture Body
Dynamics [87.17505994436308]
We build upon the insight that body motion and hand gestures are strongly correlated in non-verbal communication settings.
We formulate the learning of this prior as a prediction task of 3D hand shape over time given body motion input alone.
Our hand prediction model produces convincing 3D hand gestures given only the 3D motion of the speaker's arms as input.
arXiv Detail & Related papers (2020-07-23T22:58:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.