Reformulating HOI Detection as Adaptive Set Prediction
- URL: http://arxiv.org/abs/2103.05983v1
- Date: Wed, 10 Mar 2021 10:40:33 GMT
- Title: Reformulating HOI Detection as Adaptive Set Prediction
- Authors: Mingfei Chen, Yue Liao, Si Liu, Zhiyuan Chen, Fei Wang, Chen Qian
- Abstract summary: We reformulate HOI detection as an adaptive set prediction problem.
We propose an Adaptive Set-based one-stage framework (AS-Net) with parallel instance and interaction branches.
Our method outperforms previous state-of-the-art methods without any extra human pose and language features.
- Score: 25.44630995307787
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Determining which image regions to concentrate on is critical for
Human-Object Interaction (HOI) detection. Conventional HOI detectors focus on
either detected human and object pairs or pre-defined interaction locations,
which limits learning of the effective features. In this paper, we reformulate
HOI detection as an adaptive set prediction problem, with this novel
formulation, we propose an Adaptive Set-based one-stage framework (AS-Net) with
parallel instance and interaction branches. To attain this, we map a trainable
interaction query set to an interaction prediction set with a transformer. Each
query adaptively aggregates the interaction-relevant features from global
contexts through multi-head co-attention. Besides, the training process is
supervised adaptively by matching each ground-truth with the interaction
prediction. Furthermore, we design an effective instance-aware attention module
to introduce instructive features from the instance branch into the interaction
branch. Our method outperforms previous state-of-the-art methods without any
extra human pose and language features on three challenging HOI detection
datasets. Especially, we achieve over $31\%$ relative improvement on a large
scale HICO-DET dataset. Code is available at
https://github.com/yoyomimi/AS-Net.
Related papers
- Disentangled Interaction Representation for One-Stage Human-Object
Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z) - Parallel Reasoning Network for Human-Object Interaction Detection [53.422076419484945]
We propose a new transformer-based method named Parallel Reasoning Network(PR-Net)
PR-Net constructs two independent predictors for instance-level localization and relation-level understanding.
Our PR-Net has achieved competitive results on HICO-DET and V-COCO benchmarks.
arXiv Detail & Related papers (2023-01-09T17:00:34Z) - Consistency Learning via Decoding Path Augmentation for Transformers in
Human Object Interaction Detection [11.928724924319138]
We propose cross-path consistency learning (CPC) to improve HOI detection for transformers.
Our experiments demonstrate the effectiveness of our method, and we achieved significant improvement on V-COCO and HICO-DET.
arXiv Detail & Related papers (2022-04-11T02:45:00Z) - RR-Net: Injecting Interactive Semantics in Human-Object Interaction
Detection [40.65483058890176]
Latest end-to-end HOI detectors are short of relation reasoning, which leads to inability to learn HOI-specific interactive semantics for predictions.
We first present a progressive Relation-aware Frame, which brings a new structure and parameter sharing pattern for interaction inference.
Based on modules above, we construct an end-to-end trainable framework named Relation Reasoning Network (abbr. RR-Net)
arXiv Detail & Related papers (2021-04-30T14:03:10Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z) - Novel Human-Object Interaction Detection via Adversarial Domain
Generalization [103.55143362926388]
We study the problem of novel human-object interaction (HOI) detection, aiming at improving the generalization ability of the model to unseen scenarios.
The challenge mainly stems from the large compositional space of objects and predicates, which leads to the lack of sufficient training data for all the object-predicate combinations.
We propose a unified framework of adversarial domain generalization to learn object-invariant features for predicate prediction.
arXiv Detail & Related papers (2020-05-22T22:02:56Z) - Asynchronous Interaction Aggregation for Action Detection [43.34864954534389]
We propose the Asynchronous Interaction Aggregation network (AIA) that leverages different interactions to boost action detection.
There are two key designs in it: one is the Interaction Aggregation structure (IA) adopting a uniform paradigm to model and integrate multiple types of interaction; the other is the Asynchronous Memory Update algorithm (AMU) that enables us to achieve better performance.
arXiv Detail & Related papers (2020-04-16T07:03:20Z) - Learning Human-Object Interaction Detection using Interaction Points [140.0200950601552]
We propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs.
Our network predicts interaction points, which directly localize and classify the inter-action.
Experiments are performed on two popular benchmarks: V-COCO and HICO-DET.
arXiv Detail & Related papers (2020-03-31T08:42:06Z) - Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.