Mining the Benefits of Two-stage and One-stage HOI Detection
- URL: http://arxiv.org/abs/2108.05077v1
- Date: Wed, 11 Aug 2021 07:38:09 GMT
- Title: Mining the Benefits of Two-stage and One-stage HOI Detection
- Authors: Aixi Zhang, Yue Liao, Si Liu, Miao Lu, Yongliang Wang, Chen Gao,
Xiaobo Li
- Abstract summary: Two-stage methods have dominated Human-Object Interaction (HOI) detection for several years.
One-stage methods are challenging to make an appropriate trade-off on multi-task learning, i.e., object detection, and interaction classification.
We propose a novel one-stage framework with disentangling human-object detection and interaction classification in a cascade manner.
- Score: 26.919979955155664
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Two-stage methods have dominated Human-Object Interaction (HOI) detection for
several years. Recently, one-stage HOI detection methods have become popular.
In this paper, we aim to explore the essential pros and cons of two-stage and
one-stage methods. With this as the goal, we find that conventional two-stage
methods mainly suffer from positioning positive interactive human-object pairs,
while one-stage methods are challenging to make an appropriate trade-off on
multi-task learning, i.e., object detection, and interaction classification.
Therefore, a core problem is how to take the essence and discard the dregs from
the conventional two types of methods. To this end, we propose a novel
one-stage framework with disentangling human-object detection and interaction
classification in a cascade manner. In detail, we first design a human-object
pair generator based on a state-of-the-art one-stage HOI detector by removing
the interaction classification module or head and then design a relatively
isolated interaction classifier to classify each human-object pair. Two cascade
decoders in our proposed framework can focus on one specific task, detection or
interaction classification. In terms of the specific implementation, we adopt a
transformer-based HOI detector as our base model. The newly introduced
disentangling paradigm outperforms existing methods by a large margin, with a
significant relative mAP gain of 9.32% on HICO-Det.
Related papers
- A Review of Human-Object Interaction Detection [6.1941885271010175]
Human-object interaction (HOI) detection plays a key role in high-level visual understanding.
This paper systematically summarizes and discusses the recent work in image-based HOI detection.
arXiv Detail & Related papers (2024-08-20T08:32:39Z) - Disentangled Interaction Representation for One-Stage Human-Object
Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - Decoupling Object Detection from Human-Object Interaction Recognition [37.133695677465376]
DEFR is a DEtection-FRee method to recognize Human-Object Interactions (HOI) at image level without using object location or human pose.
We propose two findings to boost the performance of the detection-free approach, which significantly outperforms the detection-assisted state of the arts.
arXiv Detail & Related papers (2021-12-13T03:01:49Z) - ACP++: Action Co-occurrence Priors for Human-Object Interaction
Detection [102.9428507180728]
A common problem in the task of human-object interaction (HOI) detection is that numerous HOI classes have only a small number of labeled examples.
We observe that there exist natural correlations and anti-correlations among human-object interactions.
We present techniques to learn these priors and leverage them for more effective training, especially on rare classes.
arXiv Detail & Related papers (2021-09-09T06:02:50Z) - HOTR: End-to-End Human-Object Interaction Detection with Transformers [26.664864824357164]
We present a novel framework, referred to by HOTR, which directly predicts a set of human, object, interaction> triplets from an image.
Our proposed algorithm achieves the state-of-the-art performance in two HOI detection benchmarks with an inference time under 1 ms after object detection.
arXiv Detail & Related papers (2021-04-28T10:10:29Z) - Diverse Knowledge Distillation for End-to-End Person Search [81.4926655119318]
Person search aims to localize and identify a specific person from a gallery of images.
Recent methods can be categorized into two groups, i.e., two-step and end-to-end approaches.
We propose a simple yet strong end-to-end network with diverse knowledge distillation to break the bottleneck.
arXiv Detail & Related papers (2020-12-21T09:04:27Z) - DIRV: Dense Interaction Region Voting for End-to-End Human-Object
Interaction Detection [53.40028068801092]
We propose a novel one-stage HOI detection approach based on a new concept called interaction region for the HOI problem.
Unlike previous methods, our approach concentrates on the densely sampled interaction regions across different scales for each human-object pair.
In order to compensate for the detection flaws of a single interaction region, we introduce a novel voting strategy.
arXiv Detail & Related papers (2020-10-02T13:57:58Z) - Detecting Human-Object Interactions with Action Co-occurrence Priors [108.31956827512376]
A common problem in human-object interaction (HOI) detection task is that numerous HOI classes have only a small number of labeled examples.
We observe that there exist natural correlations and anti-correlations among human-object interactions.
We present techniques to learn these priors and leverage them for more effective training, especially in rare classes.
arXiv Detail & Related papers (2020-07-17T02:47:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.