HOKEM: Human and Object Keypoint-based Extension Module for Human-Object
Interaction Detection
- URL: http://arxiv.org/abs/2306.14260v1
- Date: Sun, 25 Jun 2023 14:40:26 GMT
- Title: HOKEM: Human and Object Keypoint-based Extension Module for Human-Object
Interaction Detection
- Authors: Yoshiki Ito
- Abstract summary: This paper presents the human and object keypoint-based extension module (HOKEM) as an easy-to-use extension module to improve the accuracy of the conventional detection models.
Experiments using the HOI dataset, V-COCO, showed that HOKEM boosted the accuracy of an appearance-based model by a large margin.
- Score: 1.2183405753834557
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human-object interaction (HOI) detection for capturing relationships between
humans and objects is an important task in the semantic understanding of
images. When processing human and object keypoints extracted from an image
using a graph convolutional network (GCN) to detect HOI, it is crucial to
extract appropriate object keypoints regardless of the object type and to
design a GCN that accurately captures the spatial relationships between
keypoints. This paper presents the human and object keypoint-based extension
module (HOKEM) as an easy-to-use extension module to improve the accuracy of
the conventional detection models. The proposed object keypoint extraction
method is simple yet accurately represents the shapes of various objects.
Moreover, the proposed human-object adaptive GCN (HO-AGCN), which introduces
adaptive graph optimization and attention mechanism, accurately captures the
spatial relationships between keypoints. Experiments using the HOI dataset,
V-COCO, showed that HOKEM boosted the accuracy of an appearance-based model by
a large margin.
Related papers
- Boosting Gaze Object Prediction via Pixel-level Supervision from Vision Foundation Model [19.800353299691277]
This paper presents a more challenging gaze object segmentation (GOS) task, which involves inferring the pixel-level mask corresponding to the object captured by human gaze behavior.
We propose to automatically obtain head features from scene features to ensure the model's inference efficiency and flexibility in the real world.
arXiv Detail & Related papers (2024-08-02T06:32:45Z) - A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap [50.079224604394]
We present a novel model-agnostic framework called textbfContext-textbfEnhanced textbfFeature textbfAment (CEFA)
CEFA consists of a feature alignment module and a context enhancement module.
Our method can serve as a plug-and-play module to improve the detection performance of HOI models on rare categories.
arXiv Detail & Related papers (2024-07-31T08:42:48Z) - Geometric Features Enhanced Human-Object Interaction Detection [11.513009304308724]
We propose a novel end-to-end Transformer-style HOI detection model, i.e., geometric features enhanced HOI detector (GeoHOI)
One key part of the model is a new unified self-supervised keypoint learning method named UniPointNet.
GeoHOI effectively upgrades a Transformer-based HOI detector benefiting from the keypoints similarities measuring the likelihood of human-object interactions.
arXiv Detail & Related papers (2024-06-26T18:52:53Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Learning Feature Matching via Matchable Keypoint-Assisted Graph Neural
Network [52.29330138835208]
Accurately matching local features between a pair of images is a challenging computer vision task.
Previous studies typically use attention based graph neural networks (GNNs) with fully-connected graphs over keypoints within/across images.
We propose MaKeGNN, a sparse attention-based GNN architecture which bypasses non-repeatable keypoints and leverages matchable ones to guide message passing.
arXiv Detail & Related papers (2023-07-04T02:50:44Z) - A Skeleton-aware Graph Convolutional Network for Human-Object
Interaction Detection [14.900704382194013]
We propose a skeleton-aware graph convolutional network for human-object interaction detection, named SGCN4HOI.
Our network exploits the spatial connections between human keypoints and object keypoints to capture their fine-grained structural interactions via graph convolutions.
It fuses such geometric features with visual features and spatial configuration features obtained from human-object pairs.
arXiv Detail & Related papers (2022-07-11T15:20:18Z) - DRG: Dual Relation Graph for Human-Object Interaction Detection [65.50707710054141]
We tackle the challenging problem of human-object interaction (HOI) detection.
Existing methods either recognize the interaction of each human-object pair in isolation or perform joint inference based on complex appearance-based features.
In this paper, we leverage an abstract spatial-semantic representation to describe each human-object pair and aggregate the contextual information of the scene via a dual relation graph.
arXiv Detail & Related papers (2020-08-26T17:59:40Z) - Pose-based Modular Network for Human-Object Interaction Detection [5.6397911482914385]
We contribute a Pose-based Modular Network (PMN) which explores the absolute pose features and relative spatial pose features to improve HOI detection.
To evaluate our proposed method, we combine the module with the state-of-the-art model named VS-GATs and obtain significant improvement on two public benchmarks.
arXiv Detail & Related papers (2020-08-05T10:56:09Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z) - Learning Human-Object Interaction Detection using Interaction Points [140.0200950601552]
We propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs.
Our network predicts interaction points, which directly localize and classify the inter-action.
Experiments are performed on two popular benchmarks: V-COCO and HICO-DET.
arXiv Detail & Related papers (2020-03-31T08:42:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.