QAHOI: Query-Based Anchors for Human-Object Interaction Detection
- URL: http://arxiv.org/abs/2112.08647v1
- Date: Thu, 16 Dec 2021 05:52:23 GMT
- Title: QAHOI: Query-Based Anchors for Human-Object Interaction Detection
- Authors: Junwen Chen and Keiji Yanai
- Abstract summary: One-stage approaches have become a new trend for this task due to their high efficiency.
We propose a transformer-based method, QAHOI, which uses query-based anchors to predict all the elements of an HOI instance.
We investigate that a powerful backbone significantly increases accuracy for QAHOI, and QAHOI with a transformer-based backbone outperforms recent state-of-the-art methods by large margins on the HICO-DET benchmark.
- Score: 29.548384966666013
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human-object interaction (HOI) detection as a downstream of object detection
tasks requires localizing pairs of humans and objects and extracting the
semantic relationships between humans and objects from an image. Recently,
one-stage approaches have become a new trend for this task due to their high
efficiency. However, these approaches focus on detecting possible interaction
points or filtering human-object pairs, ignoring the variability in the
location and size of different objects at spatial scales. To address this
problem, we propose a transformer-based method, QAHOI (Query-Based Anchors for
Human-Object Interaction detection), which leverages a multi-scale architecture
to extract features from different spatial scales and uses query-based anchors
to predict all the elements of an HOI instance. We further investigate that a
powerful backbone significantly increases accuracy for QAHOI, and QAHOI with a
transformer-based backbone outperforms recent state-of-the-art methods by large
margins on the HICO-DET benchmark. The source code is available at
$\href{https://github.com/cjw2021/QAHOI}{\text{this https URL}}$.
Related papers
- Hierarchical Graph Interaction Transformer with Dynamic Token Clustering for Camouflaged Object Detection [57.883265488038134]
We propose a hierarchical graph interaction network termed HGINet for camouflaged object detection.
The network is capable of discovering imperceptible objects via effective graph interaction among the hierarchical tokenized features.
Our experiments demonstrate the superior performance of HGINet compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2024-08-27T12:53:25Z) - UnionDet: Union-Level Detector Towards Real-Time Human-Object
Interaction Detection [35.2385914946471]
We propose a one-stage meta-architecture for HOI detection powered by a novel union-level detector.
Our one-stage detector for human-object interaction shows a significant reduction in interaction prediction time 4x14x.
arXiv Detail & Related papers (2023-12-19T23:34:43Z) - HODN: Disentangling Human-Object Feature for HOI Detection [51.48164941412871]
We propose a Human and Object Disentangling Network (HODN) to model the Human-Object Interaction (HOI) relationships explicitly.
Considering that human features are more contributive to interaction, we propose a Human-Guide Linking method to make sure the interaction decoder focuses on the human-centric regions.
Our proposed method achieves competitive performance on both the V-COCO and the HICO-Det Linking datasets.
arXiv Detail & Related papers (2023-08-20T04:12:50Z) - FGAHOI: Fine-Grained Anchors for Human-Object Interaction Detection [4.534713782093219]
A novel end-to-end transformer-based framework (FGAHOI) is proposed to alleviate the above problems.
FGAHOI comprises three dedicated components namely, multi-scale sampling (MSS), hierarchical spatial-aware merging (HSAM) and task-aware merging mechanism (TAM)
arXiv Detail & Related papers (2023-01-08T03:53:50Z) - Contrastive Object Detection Using Knowledge Graph Embeddings [72.17159795485915]
We compare the error statistics of the class embeddings learned from a one-hot approach with semantically structured embeddings from natural language processing or knowledge graphs.
We propose a knowledge-embedded design for keypoint-based and transformer-based object detection architectures.
arXiv Detail & Related papers (2021-12-21T17:10:21Z) - Decoupled Adaptation for Cross-Domain Object Detection [69.5852335091519]
Cross-domain object detection is more challenging than object classification.
D-adapt achieves a state-of-the-art results on four cross-domain object detection tasks.
arXiv Detail & Related papers (2021-10-06T08:43:59Z) - Human Object Interaction Detection using Two-Direction Spatial
Enhancement and Exclusive Object Prior [28.99655101929647]
Human-Object Interaction (HOI) detection aims to detect visual relations between human and objects in images.
Non-interactive human-object pair can be easily mis-grouped and misclassified as an action.
We propose a spatial enhancement approach to enforce fine-level spatial constraints in two directions.
arXiv Detail & Related papers (2021-05-07T07:18:27Z) - HOTR: End-to-End Human-Object Interaction Detection with Transformers [26.664864824357164]
We present a novel framework, referred to by HOTR, which directly predicts a set of human, object, interaction> triplets from an image.
Our proposed algorithm achieves the state-of-the-art performance in two HOI detection benchmarks with an inference time under 1 ms after object detection.
arXiv Detail & Related papers (2021-04-28T10:10:29Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z) - Learning Human-Object Interaction Detection using Interaction Points [140.0200950601552]
We propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs.
Our network predicts interaction points, which directly localize and classify the inter-action.
Experiments are performed on two popular benchmarks: V-COCO and HICO-DET.
arXiv Detail & Related papers (2020-03-31T08:42:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.