Human Action Recognition in Egocentric Perspective Using 2D Object and
Hands Pose
- URL: http://arxiv.org/abs/2306.05147v1
- Date: Thu, 8 Jun 2023 12:15:16 GMT
- Title: Human Action Recognition in Egocentric Perspective Using 2D Object and
Hands Pose
- Authors: Wiktor Mucha and Martin Kampel
- Abstract summary: Egocentric action recognition is essential for healthcare and assistive technology that relies on egocentric cameras.
This study explores the feasibility of using 2D hand and object pose information for egocentric action recognition.
- Score: 2.0305676256390934
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Egocentric action recognition is essential for healthcare and assistive
technology that relies on egocentric cameras because it allows for the
automatic and continuous monitoring of activities of daily living (ADLs)
without requiring any conscious effort from the user. This study explores the
feasibility of using 2D hand and object pose information for egocentric action
recognition. While current literature focuses on 3D hand pose information, our
work shows that using 2D skeleton data is a promising approach for hand-based
action classification, might offer privacy enhancement, and could be less
computationally demanding. The study uses a state-of-the-art transformer-based
method to classify sequences and achieves validation results of 94%,
outperforming other existing solutions. The accuracy of the test subset drops
to 76%, indicating the need for further generalization improvement. This
research highlights the potential of 2D hand and object pose information for
action recognition tasks and offers a promising alternative to 3D-based
methods.
Related papers
- Semi-supervised 3D Semantic Scene Completion with 2D Vision Foundation Model Guidance [11.090775523892074]
We introduce a novel semi-supervised framework to alleviate the dependency on densely annotated data.
Our approach leverages 2D foundation models to generate essential 3D scene geometric and semantic cues.
Our method achieves up to 85% of the fully-supervised performance using only 10% labeled data.
arXiv Detail & Related papers (2024-08-21T12:13:18Z) - HOIMotion: Forecasting Human Motion During Human-Object Interactions Using Egocentric 3D Object Bounding Boxes [10.237077867790612]
We present HOIMotion, a novel approach for human motion forecasting during human-object interactions.
Our method integrates information about past body poses and egocentric 3D object bounding boxes.
We show that HOIMotion consistently outperforms state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2024-07-02T19:58:35Z) - In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition [1.4732811715354455]
Action recognition is essential for egocentric video understanding, allowing automatic and continuous monitoring of Activities of Daily Living (ADLs) without user effort.
Existing literature focuses on 3D hand pose input, which requires computationally intensive depth estimation networks or wearing an uncomfortable depth sensor.
We introduce two novel approaches for 2D hand pose estimation, namely EffHandNet for single-hand estimation and EffHandEgoNet, tailored for an egocentric perspective.
arXiv Detail & Related papers (2024-04-14T17:33:33Z) - Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects [89.95728475983263]
holistic 3Dunderstanding of such interactions from egocentric views is important for tasks in robotics, AR/VR, action recognition and motion generation.
We design the HANDS23 challenge based on the AssemblyHands and ARCTIC datasets with carefully designed training and testing splits.
Based on the results of the top submitted methods and more recent baselines on the leaderboards, we perform a thorough analysis on 3D hand(-object) reconstruction tasks.
arXiv Detail & Related papers (2024-03-25T05:12:21Z) - RiCS: A 2D Self-Occlusion Map for Harmonizing Volumetric Objects [68.85305626324694]
Ray-marching in Camera Space (RiCS) is a new method to represent the self-occlusions of foreground objects in 3D into a 2D self-occlusion map.
We show that our representation map not only allows us to enhance the image quality but also to model temporally coherent complex shadow effects.
arXiv Detail & Related papers (2022-05-14T05:35:35Z) - Skeleton-Based Mutually Assisted Interacted Object Localization and
Human Action Recognition [111.87412719773889]
We propose a joint learning framework for "interacted object localization" and "human action recognition" based on skeleton data.
Our method achieves the best or competitive performance with the state-of-the-art methods for human action recognition.
arXiv Detail & Related papers (2021-10-28T10:09:34Z) - Towards unconstrained joint hand-object reconstruction from RGB videos [81.97694449736414]
Reconstructing hand-object manipulations holds a great potential for robotics and learning from human demonstrations.
We first propose a learning-free fitting approach for hand-object reconstruction which can seamlessly handle two-hand object interactions.
arXiv Detail & Related papers (2021-08-16T12:26:34Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - H2O: Two Hands Manipulating Objects for First Person Interaction
Recognition [70.46638409156772]
We present a comprehensive framework for egocentric interaction recognition using markerless 3D annotations of two hands manipulating objects.
Our method produces annotations of the 3D pose of two hands and the 6D pose of the manipulated objects, along with their interaction labels for each frame.
Our dataset, called H2O (2 Hands and Objects), provides synchronized multi-view RGB-D images, interaction labels, object classes, ground-truth 3D poses for left & right hands, 6D object poses, ground-truth camera poses, object meshes and scene point clouds.
arXiv Detail & Related papers (2021-04-22T17:10:42Z) - View-Invariant, Occlusion-Robust Probabilistic Embedding for Human Pose [36.384824115033304]
We propose an approach to learning a compact view-invariant embedding space from 2D body joint keypoints, without explicitly predicting 3D poses.
Experimental results show that our embedding model achieves higher accuracy when retrieving similar poses across different camera views.
arXiv Detail & Related papers (2020-10-23T17:58:35Z) - A review of 3D human pose estimation algorithms for markerless motion
capture [0.0]
We review the leading human pose estimation methods of the past five years, focusing on metrics, benchmarks and method structures.
We propose a taxonomy based on accuracy, speed and robustness that we use to classify de methods and derive directions for future research.
arXiv Detail & Related papers (2020-10-13T15:07:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.