Pose-based Modular Network for Human-Object Interaction Detection
- URL: http://arxiv.org/abs/2008.02042v1
- Date: Wed, 5 Aug 2020 10:56:09 GMT
- Title: Pose-based Modular Network for Human-Object Interaction Detection
- Authors: Zhijun Liang, Junfa Liu, Yisheng Guan, and Juan Rojas
- Abstract summary: We contribute a Pose-based Modular Network (PMN) which explores the absolute pose features and relative spatial pose features to improve HOI detection.
To evaluate our proposed method, we combine the module with the state-of-the-art model named VS-GATs and obtain significant improvement on two public benchmarks.
- Score: 5.6397911482914385
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human-object interaction(HOI) detection is a critical task in scene
understanding. The goal is to infer the triplet <subject, predicate, object> in
a scene. In this work, we note that the human pose itself as well as the
relative spatial information of the human pose with respect to the target
object can provide informative cues for HOI detection. We contribute a
Pose-based Modular Network (PMN) which explores the absolute pose features and
relative spatial pose features to improve HOI detection and is fully compatible
with existing networks. Our module consists of a branch that first processes
the relative spatial pose features of each joint independently. Another branch
updates the absolute pose features via fully connected graph structures. The
processed pose features are then fed into an action classifier. To evaluate our
proposed method, we combine the module with the state-of-the-art model named
VS-GATs and obtain significant improvement on two public benchmarks: V-COCO and
HICO-DET, which shows its efficacy and flexibility. Code is available at
\url{https://github.com/birlrobotics/PMN}.
Related papers
- A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap [50.079224604394]
We present a novel model-agnostic framework called textbfContext-textbfEnhanced textbfFeature textbfAment (CEFA)
CEFA consists of a feature alignment module and a context enhancement module.
Our method can serve as a plug-and-play module to improve the detection performance of HOI models on rare categories.
arXiv Detail & Related papers (2024-07-31T08:42:48Z) - HOKEM: Human and Object Keypoint-based Extension Module for Human-Object
Interaction Detection [1.2183405753834557]
This paper presents the human and object keypoint-based extension module (HOKEM) as an easy-to-use extension module to improve the accuracy of the conventional detection models.
Experiments using the HOI dataset, V-COCO, showed that HOKEM boosted the accuracy of an appearance-based model by a large margin.
arXiv Detail & Related papers (2023-06-25T14:40:26Z) - ViPLO: Vision Transformer based Pose-Conditioned Self-Loop Graph for
Human-Object Interaction Detection [20.983998911754792]
Two-stage Human-Object Interaction (HOI) detectors suffer from lower performance than one-stage methods.
We propose Vision Transformer based Pose-Conditioned Self-Loop Graph (ViPLO) to resolve these problems.
ViPLO achieves the state-of-the-art results on two public benchmarks.
arXiv Detail & Related papers (2023-04-17T09:44:54Z) - A Skeleton-aware Graph Convolutional Network for Human-Object
Interaction Detection [14.900704382194013]
We propose a skeleton-aware graph convolutional network for human-object interaction detection, named SGCN4HOI.
Our network exploits the spatial connections between human keypoints and object keypoints to capture their fine-grained structural interactions via graph convolutions.
It fuses such geometric features with visual features and spatial configuration features obtained from human-object pairs.
arXiv Detail & Related papers (2022-07-11T15:20:18Z) - Exploiting Scene Graphs for Human-Object Interaction Detection [81.49184987430333]
Human-Object Interaction (HOI) detection is a fundamental visual task aiming at localizing and recognizing interactions between humans and objects.
We propose a novel method to exploit this information, through the scene graph, for the Human-Object Interaction (SG2HOI) detection task.
Our method, SG2HOI, incorporates the SG information in two ways: (1) we embed a scene graph into a global context clue, serving as the scene-specific environmental context; and (2) we build a relation-aware message-passing module to gather relationships from objects' neighborhood and transfer them into interactions.
arXiv Detail & Related papers (2021-08-19T09:40:50Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z) - Attention-based Joint Detection of Object and Semantic Part [4.389917490809522]
Our model is created on top of two Faster-RCNN models that share their features to get enhanced representations of both.
Experiments on the PASCAL-Part 2010 dataset show that joint detection can simultaneously improve both object detection and part detection.
arXiv Detail & Related papers (2020-07-05T18:54:10Z) - Spatial Priming for Detecting Human-Object Interactions [89.22921959224396]
We present a method for exploiting spatial layout information for detecting human-object interactions (HOIs) in images.
The proposed method consists of a layout module which primes a visual module to predict the type of interaction between a human and an object.
The proposed model reaches an mAP of 24.79% for HICO-Det dataset which is about 2.8% absolute points higher than the current state-of-the-art.
arXiv Detail & Related papers (2020-04-09T23:20:30Z) - Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.