Detecting Human-Object Interaction via Fabricated Compositional Learning
- URL: http://arxiv.org/abs/2103.08214v1
- Date: Mon, 15 Mar 2021 08:52:56 GMT
- Title: Detecting Human-Object Interaction via Fabricated Compositional Learning
- Authors: Zhi Hou, Baosheng Yu, Yu Qiao, Xiaojiang Peng, Dacheng Tao
- Abstract summary: Human-Object Interaction (HOI) detection is a fundamental task for high-level scene understanding.
Human has extremely powerful compositional perception ability to cognize rare or unseen HOI samples.
We propose Fabricated Compositional Learning (FCL) to address the problem of open long-tailed HOI detection.
- Score: 106.37536031160282
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human-Object Interaction (HOI) detection, inferring the relationships between
human and objects from images/videos, is a fundamental task for high-level
scene understanding. However, HOI detection usually suffers from the open
long-tailed nature of interactions with objects, while human has extremely
powerful compositional perception ability to cognize rare or unseen HOI
samples. Inspired by this, we devise a novel HOI compositional learning
framework, termed as Fabricated Compositional Learning (FCL), to address the
problem of open long-tailed HOI detection. Specifically, we introduce an object
fabricator to generate effective object representations, and then combine verbs
and fabricated objects to compose new HOI samples. With the proposed object
fabricator, we are able to generate large-scale HOI samples for rare and unseen
categories to alleviate the open long-tailed issues in HOI detection. Extensive
experiments on the most popular HOI detection dataset, HICO-DET, demonstrate
the effectiveness of the proposed method for imbalanced HOI detection and
significantly improve the state-of-the-art performance on rare and unseen HOI
categories. Code is available at https://github.com/zhihou7/FCL.
Related papers
- A Review of Human-Object Interaction Detection [6.1941885271010175]
Human-object interaction (HOI) detection plays a key role in high-level visual understanding.
This paper systematically summarizes and discusses the recent work in image-based HOI detection.
arXiv Detail & Related papers (2024-08-20T08:32:39Z) - UnionDet: Union-Level Detector Towards Real-Time Human-Object
Interaction Detection [35.2385914946471]
We propose a one-stage meta-architecture for HOI detection powered by a novel union-level detector.
Our one-stage detector for human-object interaction shows a significant reduction in interaction prediction time 4x14x.
arXiv Detail & Related papers (2023-12-19T23:34:43Z) - HODN: Disentangling Human-Object Feature for HOI Detection [51.48164941412871]
We propose a Human and Object Disentangling Network (HODN) to model the Human-Object Interaction (HOI) relationships explicitly.
Considering that human features are more contributive to interaction, we propose a Human-Guide Linking method to make sure the interaction decoder focuses on the human-centric regions.
Our proposed method achieves competitive performance on both the V-COCO and the HICO-Det Linking datasets.
arXiv Detail & Related papers (2023-08-20T04:12:50Z) - Compositional Learning in Transformer-Based Human-Object Interaction
Detection [6.630793383852106]
Long-tailed distribution of labeled instances is a primary challenge in HOI detection.
Inspired by the nature of HOI triplets, some existing approaches adopt the idea of compositional learning.
We creatively propose a transformer-based framework for compositional HOI learning.
arXiv Detail & Related papers (2023-08-11T06:41:20Z) - Weakly-Supervised HOI Detection from Interaction Labels Only and
Language/Vision-Language Priors [36.75629570208193]
Human-object interaction (HOI) detection aims to extract interacting human-object pairs and their interaction categories from a given natural image.
In this paper, we tackle HOI detection with the weakest supervision setting in the literature, using only image-level interaction labels.
We first propose an approach to prune non-interacting human and object proposals to increase the quality of positive pairs within the bag, exploiting the grounding capability of the vision-language model.
Second, we use a large language model to query which interactions are possible between a human and a given object category, in order to force the model not to put emphasis
arXiv Detail & Related papers (2023-03-09T19:08:02Z) - Affordance Transfer Learning for Human-Object Interaction Detection [106.37536031160282]
We introduce an affordance transfer learning approach to jointly detect HOIs with novel objects and recognize affordances.
Specifically, HOI representations can be decoupled into a combination of affordance and object representations.
With the proposed affordance transfer learning, the model is also capable of inferring the affordances of novel objects from known affordance representations.
arXiv Detail & Related papers (2021-04-07T02:37:04Z) - DecAug: Augmenting HOI Detection via Decomposition [54.65572599920679]
Current algorithms suffer from insufficient training samples and category imbalance within datasets.
We propose an efficient and effective data augmentation method called DecAug for HOI detection.
Experiments show that our method brings up to 3.3 mAP and 1.6 mAP improvements on V-COCO and HICODET dataset.
arXiv Detail & Related papers (2020-10-02T13:59:05Z) - Visual Compositional Learning for Human-Object Interaction Detection [111.05263071111807]
Human-Object interaction (HOI) detection aims to localize and infer relationships between human and objects in an image.
It is challenging because an enormous number of possible combinations of objects and verbs types forms a long-tail distribution.
We devise a deep Visual Compositional Learning framework, which is a simple yet efficient framework to effectively address this problem.
arXiv Detail & Related papers (2020-07-24T08:37:40Z) - Learning Human-Object Interaction Detection using Interaction Points [140.0200950601552]
We propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs.
Our network predicts interaction points, which directly localize and classify the inter-action.
Experiments are performed on two popular benchmarks: V-COCO and HICO-DET.
arXiv Detail & Related papers (2020-03-31T08:42:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.