Human Interaction Recognition Framework based on Interacting Body Part
Attention
- URL: http://arxiv.org/abs/2101.08967v1
- Date: Fri, 22 Jan 2021 06:52:42 GMT
- Title: Human Interaction Recognition Framework based on Interacting Body Part
Attention
- Authors: Dong-Gyu Lee, Seong-Whan Lee
- Abstract summary: We propose a novel framework that simultaneously considers both implicit and explicit representations of human interactions.
The proposed method captures the subtle difference between different interactions using interacting body part attention.
We validate the effectiveness of the proposed method using four widely used public datasets.
- Score: 24.913372626903648
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human activity recognition in videos has been widely studied and has recently
gained significant advances with deep learning approaches; however, it remains
a challenging task. In this paper, we propose a novel framework that
simultaneously considers both implicit and explicit representations of human
interactions by fusing information of local image where the interaction
actively occurred, primitive motion with the posture of individual subject's
body parts, and the co-occurrence of overall appearance change. Human
interactions change, depending on how the body parts of each human interact
with the other. The proposed method captures the subtle difference between
different interactions using interacting body part attention. Semantically
important body parts that interact with other objects are given more weight
during feature representation. The combined feature of interacting body part
attention-based individual representation and the co-occurrence descriptor of
the full-body appearance change is fed into long short-term memory to model the
temporal dynamics over time in a single framework. We validate the
effectiveness of the proposed method using four widely used public datasets by
outperforming the competing state-of-the-art method.
Related papers
- Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - THOR: Text to Human-Object Interaction Diffusion via Relation Intervention [51.02435289160616]
We propose a novel Text-guided Human-Object Interaction diffusion model with Relation Intervention (THOR)
In each diffusion step, we initiate text-guided human and object motion and then leverage human-object relations to intervene in object motion.
We construct Text-BEHAVE, a Text2HOI dataset that seamlessly integrates textual descriptions with the currently largest publicly available 3D HOI dataset.
arXiv Detail & Related papers (2024-03-17T13:17:25Z) - Disentangled Interaction Representation for One-Stage Human-Object
Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z) - Modelling Spatio-Temporal Interactions for Compositional Action
Recognition [21.8767024220287]
Humans have the natural ability to recognize actions even if the objects involved in the action or the background are changed.
We show the effectiveness of our interaction-centric approach on the compositional Something-Else dataset.
Our approach of explicit human-object-stuff interaction modeling is effective even for standard action recognition datasets.
arXiv Detail & Related papers (2023-05-04T09:37:45Z) - Full-Body Articulated Human-Object Interaction [61.01135739641217]
CHAIRS is a large-scale motion-captured f-AHOI dataset consisting of 16.2 hours of versatile interactions.
CHAIRS provides 3D meshes of both humans and articulated objects during the entire interactive process.
By learning the geometrical relationships in HOI, we devise the very first model that leverage human pose estimation.
arXiv Detail & Related papers (2022-12-20T19:50:54Z) - Mining Cross-Person Cues for Body-Part Interactiveness Learning in HOI
Detection [39.61023122191333]
Human-Object Interaction (HOI) detection plays a crucial role in activity understanding.
Previous works only focus on the target person once (i.e., in a local perspective) and overlook the information of the other persons.
In this paper, we argue that comparing body-parts of multi-person simultaneously can afford us more useful and supplementary interactiveness cues.
We construct body-part saliency maps based on self-attention to mine cross-person informative cues and learn the holistic relationships between all the body-parts.
arXiv Detail & Related papers (2022-07-28T15:57:51Z) - IGFormer: Interaction Graph Transformer for Skeleton-based Human
Interaction Recognition [26.05948629634753]
We propose a novel Interaction Graph Transformer (IGFormer) network for skeleton-based interaction recognition.
IGFormer constructs interaction graphs according to the semantic and distance correlations between the interactive body parts.
We also propose a Semantic Partition Module to transform each human skeleton sequence into a Body-Part-Time sequence.
arXiv Detail & Related papers (2022-07-25T12:11:15Z) - Skeleton-Based Mutually Assisted Interacted Object Localization and
Human Action Recognition [111.87412719773889]
We propose a joint learning framework for "interacted object localization" and "human action recognition" based on skeleton data.
Our method achieves the best or competitive performance with the state-of-the-art methods for human action recognition.
arXiv Detail & Related papers (2021-10-28T10:09:34Z) - Spatio-Temporal Interaction Graph Parsing Networks for Human-Object
Interaction Recognition [55.7731053128204]
In given video-based Human-Object Interaction scene, modeling thetemporal relationship between humans and objects are the important cue to understand the contextual information presented in the video.
With the effective-temporal relationship modeling, it is possible not only to uncover contextual information in each frame but also directly capture inter-time dependencies.
The full use of appearance features, spatial location and the semantic information are also the key to improve the video-based Human-Object Interaction recognition performance.
arXiv Detail & Related papers (2021-08-19T11:57:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.