Related papers: A Review of Human-Object Interaction Detection

A Review of Human-Object Interaction Detection

URL: http://arxiv.org/abs/2408.10641v1
Date: Tue, 20 Aug 2024 08:32:39 GMT
Title: A Review of Human-Object Interaction Detection
Authors: Yuxiao Wang, Qiwei Xiong, Yu Lei, Weiying Xue, Qi Liu, Zhenao Wei,
Abstract summary: Human-object interaction (HOI) detection plays a key role in high-level visual understanding. This paper systematically summarizes and discusses the recent work in image-based HOI detection.
Score: 6.1941885271010175
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Human-object interaction (HOI) detection plays a key role in high-level visual understanding, facilitating a deep comprehension of human activities. Specifically, HOI detection aims to locate the humans and objects involved in interactions within images or videos and classify the specific interactions between them. The success of this task is influenced by several key factors, including the accurate localization of human and object instances, as well as the correct classification of object categories and interaction relationships. This paper systematically summarizes and discusses the recent work in image-based HOI detection. First, the mainstream datasets involved in HOI relationship detection are introduced. Furthermore, starting with two-stage methods and end-to-end one-stage detection approaches, this paper comprehensively discusses the current developments in image-based HOI detection, analyzing the strengths and weaknesses of these two methods. Additionally, the advancements of zero-shot learning, weakly supervised learning, and the application of large-scale language models in HOI detection are discussed. Finally, the current challenges in HOI detection are outlined, and potential research directions and future trends are explored.

Related papers

Disentangled Interaction Representation for One-Stage Human-Object Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding. Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction. Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z)
HODN: Disentangling Human-Object Feature for HOI Detection [51.48164941412871]
We propose a Human and Object Disentangling Network (HODN) to model the Human-Object Interaction (HOI) relationships explicitly. Considering that human features are more contributive to interaction, we propose a Human-Guide Linking method to make sure the interaction decoder focuses on the human-centric regions. Our proposed method achieves competitive performance on both the V-COCO and the HICO-Det Linking datasets.
arXiv Detail & Related papers (2023-08-20T04:12:50Z)
Exploring Predicate Visual Context in Detecting Human-Object Interactions [44.937383506126274]
We study how best to re-introduce image features via cross-attention. Our model with enhanced predicate visual context (PViC) outperforms state-of-the-art methods on the HICO-DET and V-COCO benchmarks.
arXiv Detail & Related papers (2023-08-11T15:57:45Z)
Weakly-Supervised HOI Detection from Interaction Labels Only and Language/Vision-Language Priors [36.75629570208193]
Human-object interaction (HOI) detection aims to extract interacting human-object pairs and their interaction categories from a given natural image. In this paper, we tackle HOI detection with the weakest supervision setting in the literature, using only image-level interaction labels. We first propose an approach to prune non-interacting human and object proposals to increase the quality of positive pairs within the bag, exploiting the grounding capability of the vision-language model. Second, we use a large language model to query which interactions are possible between a human and a given object category, in order to force the model not to put emphasis
arXiv Detail & Related papers (2023-03-09T19:08:02Z)
Knowledge Guided Bidirectional Attention Network for Human-Object Interaction Detection [3.0915392100355192]
We argue that the independent use of the bottom-up parsing strategy in HOI is counter-intuitive and could lead to the diffusion of attention. We introduce a novel knowledge-guided top-down attention into HOI, and propose to model the relation parsing as a "look and search" process. We implement the process via unifying the bottom-up and top-down attention in a single encoder-decoder based model.
arXiv Detail & Related papers (2022-07-16T16:42:49Z)
ACP++: Action Co-occurrence Priors for Human-Object Interaction Detection [102.9428507180728]
A common problem in the task of human-object interaction (HOI) detection is that numerous HOI classes have only a small number of labeled examples. We observe that there exist natural correlations and anti-correlations among human-object interactions. We present techniques to learn these priors and leverage them for more effective training, especially on rare classes.
arXiv Detail & Related papers (2021-09-09T06:02:50Z)
DRG: Dual Relation Graph for Human-Object Interaction Detection [65.50707710054141]
We tackle the challenging problem of human-object interaction (HOI) detection. Existing methods either recognize the interaction of each human-object pair in isolation or perform joint inference based on complex appearance-based features. In this paper, we leverage an abstract spatial-semantic representation to describe each human-object pair and aggregate the contextual information of the scene via a dual relation graph.
arXiv Detail & Related papers (2020-08-26T17:59:40Z)
Detecting Human-Object Interactions with Action Co-occurrence Priors [108.31956827512376]
A common problem in human-object interaction (HOI) detection task is that numerous HOI classes have only a small number of labeled examples. We observe that there exist natural correlations and anti-correlations among human-object interactions. We present techniques to learn these priors and leverage them for more effective training, especially in rare classes.
arXiv Detail & Related papers (2020-07-17T02:47:45Z)
Learning Human-Object Interaction Detection using Interaction Points [140.0200950601552]
We propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs. Our network predicts interaction points, which directly localize and classify the inter-action. Experiments are performed on two popular benchmarks: V-COCO and HICO-DET.
arXiv Detail & Related papers (2020-03-31T08:42:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.