Understanding Egocentric Hand-Object Interactions from Hand Pose
Estimation
- URL: http://arxiv.org/abs/2109.14657v1
- Date: Wed, 29 Sep 2021 18:34:06 GMT
- Title: Understanding Egocentric Hand-Object Interactions from Hand Pose
Estimation
- Authors: Yao Lu and Walterio W. Mayol-Cuevas
- Abstract summary: We propose a method to label a dataset which contains the egocentric images pair-wisely.
We also use the collected pairwise data to train our encoder-decoder style network which has been proven efficient in.
- Score: 24.68535915849555
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper, we address the problem of estimating the hand pose from the
egocentric view when the hand is interacting with objects. Specifically, we
propose a method to label a dataset Ego-Siam which contains the egocentric
images pair-wisely. We also use the collected pairwise data to train our
encoder-decoder style network which has been proven efficient in. This could
bring extra training efficiency and testing accuracy. Our network is
lightweight and can be performed with over 30 FPS with an outdated GPU. We
demonstrate that our method outperforms Mueller et al. which is the state of
the art work dealing with egocentric hand-object interaction problems on the
GANerated dataset. To show the ability to preserve the semantic information of
our method, we also report the performance of grasp type classification on
GUN-71 dataset and outperforms the benchmark by only using the predicted 3-d
hand pose.
Related papers
- Learning Precise Affordances from Egocentric Videos for Robotic Manipulation [18.438782733579064]
Affordance, defined as the potential actions that an object offers, is crucial for robotic manipulation tasks.
We present a streamlined affordance learning system that encompasses data collection, effective model training, and robot deployment.
arXiv Detail & Related papers (2024-08-19T16:11:47Z) - In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition [1.4732811715354455]
Action recognition is essential for egocentric video understanding, allowing automatic and continuous monitoring of Activities of Daily Living (ADLs) without user effort.
Existing literature focuses on 3D hand pose input, which requires computationally intensive depth estimation networks or wearing an uncomfortable depth sensor.
We introduce two novel approaches for 2D hand pose estimation, namely EffHandNet for single-hand estimation and EffHandEgoNet, tailored for an egocentric perspective.
arXiv Detail & Related papers (2024-04-14T17:33:33Z) - Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects [89.95728475983263]
holistic 3Dunderstanding of such interactions from egocentric views is important for tasks in robotics, AR/VR, action recognition and motion generation.
We design the HANDS23 challenge based on the AssemblyHands and ARCTIC datasets with carefully designed training and testing splits.
Based on the results of the top submitted methods and more recent baselines on the leaderboards, we perform a thorough analysis on 3D hand(-object) reconstruction tasks.
arXiv Detail & Related papers (2024-03-25T05:12:21Z) - AssemblyHands: Towards Egocentric Activity Understanding via 3D Hand
Pose Estimation [26.261767086366866]
We present AssemblyHands, a large-scale benchmark dataset with accurate 3D hand pose annotations.
AssemblyHands provides 3.0M annotated images, including 490K egocentric images.
Our study shows that having higher-quality hand poses directly improves the ability to recognize actions.
arXiv Detail & Related papers (2023-04-24T17:52:57Z) - Interacting Hand-Object Pose Estimation via Dense Mutual Attention [97.26400229871888]
3D hand-object pose estimation is the key to the success of many computer vision applications.
We propose a novel dense mutual attention mechanism that is able to model fine-grained dependencies between the hand and the object.
Our method is able to produce physically plausible poses with high quality and real-time inference speed.
arXiv Detail & Related papers (2022-11-16T10:01:33Z) - S$^2$Contact: Graph-based Network for 3D Hand-Object Contact Estimation
with Semi-Supervised Learning [70.72037296392642]
We propose a novel semi-supervised framework that allows us to learn contact from monocular images.
Specifically, we leverage visual and geometric consistency constraints in large-scale datasets for generating pseudo-labels.
We show benefits from using a contact map that rules hand-object interactions to produce more accurate reconstructions.
arXiv Detail & Related papers (2022-08-01T14:05:23Z) - What's in your hands? 3D Reconstruction of Generic Objects in Hands [49.12461675219253]
Our work aims to reconstruct hand-held objects given a single RGB image.
In contrast to prior works that typically assume known 3D templates and reduce the problem to 3D pose estimation, our work reconstructs generic hand-held object without knowing their 3D templates.
arXiv Detail & Related papers (2022-04-14T17:59:02Z) - Estimating Egocentric 3D Human Pose in the Wild with External Weak
Supervision [72.36132924512299]
We present a new egocentric pose estimation method, which can be trained on a large-scale in-the-wild egocentric dataset.
We propose a novel learning strategy to supervise the egocentric features with the high-quality features extracted by a pretrained external-view pose estimation model.
Experiments show that our method predicts accurate 3D poses from a single in-the-wild egocentric image and outperforms the state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2022-01-20T00:45:13Z) - H2O: Two Hands Manipulating Objects for First Person Interaction
Recognition [70.46638409156772]
We present a comprehensive framework for egocentric interaction recognition using markerless 3D annotations of two hands manipulating objects.
Our method produces annotations of the 3D pose of two hands and the 6D pose of the manipulated objects, along with their interaction labels for each frame.
Our dataset, called H2O (2 Hands and Objects), provides synchronized multi-view RGB-D images, interaction labels, object classes, ground-truth 3D poses for left & right hands, 6D object poses, ground-truth camera poses, object meshes and scene point clouds.
arXiv Detail & Related papers (2021-04-22T17:10:42Z) - MVHM: A Large-Scale Multi-View Hand Mesh Benchmark for Accurate 3D Hand
Pose Estimation [32.12879364117658]
Estimating 3D hand poses from a single RGB image is challenging because depth ambiguity leads the problem ill-posed.
We design a spin match algorithm that enables a rigid mesh model matching with any target mesh ground truth.
We present a multi-view hand pose estimation approach to verify that training a hand pose estimator with our generated dataset greatly enhances the performance.
arXiv Detail & Related papers (2020-12-06T07:55:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.