Skeleton-Based Mutually Assisted Interacted Object Localization and
Human Action Recognition
- URL: http://arxiv.org/abs/2110.14994v1
- Date: Thu, 28 Oct 2021 10:09:34 GMT
- Title: Skeleton-Based Mutually Assisted Interacted Object Localization and
Human Action Recognition
- Authors: Liang Xu, Cuiling Lan, Wenjun Zeng, Cewu Lu
- Abstract summary: We propose a joint learning framework for "interacted object localization" and "human action recognition" based on skeleton data.
Our method achieves the best or competitive performance with the state-of-the-art methods for human action recognition.
- Score: 111.87412719773889
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Skeleton data carries valuable motion information and is widely explored in
human action recognition. However, not only the motion information but also the
interaction with the environment provides discriminative cues to recognize the
action of persons. In this paper, we propose a joint learning framework for
mutually assisted "interacted object localization" and "human action
recognition" based on skeleton data. The two tasks are serialized together and
collaborate to promote each other, where preliminary action type derived from
skeleton alone helps improve interacted object localization, which in turn
provides valuable cues for the final human action recognition. Besides, we
explore the temporal consistency of interacted object as constraint to better
localize the interacted object with the absence of ground-truth labels.
Extensive experiments on the datasets of SYSU-3D, NTU60 RGB+D and
Northwestern-UCLA show that our method achieves the best or competitive
performance with the state-of-the-art methods for human action recognition.
Visualization results show that our method can also provide reasonable
interacted object localization results.
Related papers
- Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - LEMON: Learning 3D Human-Object Interaction Relation from 2D Images [56.6123961391372]
Learning 3D human-object interaction relation is pivotal to embodied AI and interaction modeling.
Most existing methods approach the goal by learning to predict isolated interaction elements.
We present LEMON, a unified model that mines interaction intentions of the counterparts and employs curvatures to guide the extraction of geometric correlations.
arXiv Detail & Related papers (2023-12-14T14:10:57Z) - Disentangled Interaction Representation for One-Stage Human-Object
Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z) - HODN: Disentangling Human-Object Feature for HOI Detection [51.48164941412871]
We propose a Human and Object Disentangling Network (HODN) to model the Human-Object Interaction (HOI) relationships explicitly.
Considering that human features are more contributive to interaction, we propose a Human-Guide Linking method to make sure the interaction decoder focuses on the human-centric regions.
Our proposed method achieves competitive performance on both the V-COCO and the HICO-Det Linking datasets.
arXiv Detail & Related papers (2023-08-20T04:12:50Z) - How Object Information Improves Skeleton-based Human Action Recognition
in Assembly Tasks [12.349172146831506]
We present a novel approach of integrating object information into skeleton-based action recognition.
We enhance two state-of-the-art methods by treating object centers as further skeleton joints.
Our research sheds light on the benefits of combining skeleton joints with object information for human action recognition in assembly tasks.
arXiv Detail & Related papers (2023-06-09T12:18:14Z) - Modelling Spatio-Temporal Interactions for Compositional Action
Recognition [21.8767024220287]
Humans have the natural ability to recognize actions even if the objects involved in the action or the background are changed.
We show the effectiveness of our interaction-centric approach on the compositional Something-Else dataset.
Our approach of explicit human-object-stuff interaction modeling is effective even for standard action recognition datasets.
arXiv Detail & Related papers (2023-05-04T09:37:45Z) - Human Interaction Recognition Framework based on Interacting Body Part
Attention [24.913372626903648]
We propose a novel framework that simultaneously considers both implicit and explicit representations of human interactions.
The proposed method captures the subtle difference between different interactions using interacting body part attention.
We validate the effectiveness of the proposed method using four widely used public datasets.
arXiv Detail & Related papers (2021-01-22T06:52:42Z) - Pose And Joint-Aware Action Recognition [87.4780883700755]
We present a new model for joint-based action recognition, which first extracts motion features from each joint separately through a shared motion encoder.
Our joint selector module re-weights the joint information to select the most discriminative joints for the task.
We show large improvements over the current state-of-the-art joint-based approaches on JHMDB, HMDB, Charades, AVA action recognition datasets.
arXiv Detail & Related papers (2020-10-16T04:43:34Z) - Attention-Oriented Action Recognition for Real-Time Human-Robot
Interaction [11.285529781751984]
We propose an attention-oriented multi-level network framework to meet the need for real-time interaction.
Specifically, a Pre-Attention network is employed to roughly focus on the interactor in the scene at low resolution.
The other compact CNN receives the extracted skeleton sequence as input for action recognition.
arXiv Detail & Related papers (2020-07-02T12:41:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.