FreeA: Human-object Interaction Detection using Free Annotation Labels
- URL: http://arxiv.org/abs/2403.01840v1
- Date: Mon, 4 Mar 2024 08:38:15 GMT
- Title: FreeA: Human-object Interaction Detection using Free Annotation Labels
- Authors: Yuxiao Wang, Zhenao Wei, Xinyu Jiang, Yu Lei, Weiying Xue, Jinxiu Liu,
Qi Liu
- Abstract summary: We propose a novel self-adaption language-driven HOI detection method, termed as FreeA, without labeling.
FreeA matches image features of human-object pairs with HOI text templates, and a priori knowledge-based mask method is developed to suppress improbable interactions.
Experiments on two benchmark datasets show that FreeA state-of-the-art performance among weakly supervised HOI models.
- Score: 9.537338958326181
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent human-object interaction (HOI) detection approaches rely on high cost
of manpower and require comprehensive annotated image datasets. In this paper,
we propose a novel self-adaption language-driven HOI detection method, termed
as FreeA, without labeling by leveraging the adaptability of CLIP to generate
latent HOI labels. To be specific, FreeA matches image features of human-object
pairs with HOI text templates, and a priori knowledge-based mask method is
developed to suppress improbable interactions. In addition, FreeA utilizes the
proposed interaction correlation matching method to enhance the likelihood of
actions related to a specified action, further refine the generated HOI labels.
Experiments on two benchmark datasets show that FreeA achieves state-of-the-art
performance among weakly supervised HOI models. Our approach is +8.58 mean
Average Precision (mAP) on HICO-DET and +1.23 mAP on V-COCO more accurate in
localizing and classifying the interactive actions than the newest weakly
model, and +1.68 mAP and +7.28 mAP than the latest weakly+ model, respectively.
Code will be available at https://drliuqi.github.io/.
Related papers
- Unified Visual Relationship Detection with Vision and Language Models [89.77838890788638]
This work focuses on training a single visual relationship detector predicting over the union of label spaces from multiple datasets.
We propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models.
Empirical results on both human-object interaction detection and scene-graph generation demonstrate the competitive performance of our model.
arXiv Detail & Related papers (2023-03-16T00:06:28Z) - End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge
Distillation [86.41437210485932]
We aim at advancing zero-shot HOI detection to detect both seen and unseen HOIs simultaneously.
We propose a novel end-to-end zero-shot HOI Detection framework via vision-language knowledge distillation.
Our method outperforms the previous SOTA by 8.92% on unseen mAP and 10.18% on overall mAP.
arXiv Detail & Related papers (2022-04-01T07:27:19Z) - Decoupling Object Detection from Human-Object Interaction Recognition [37.133695677465376]
DEFR is a DEtection-FRee method to recognize Human-Object Interactions (HOI) at image level without using object location or human pose.
We propose two findings to boost the performance of the detection-free approach, which significantly outperforms the detection-assisted state of the arts.
arXiv Detail & Related papers (2021-12-13T03:01:49Z) - DecAug: Augmenting HOI Detection via Decomposition [54.65572599920679]
Current algorithms suffer from insufficient training samples and category imbalance within datasets.
We propose an efficient and effective data augmentation method called DecAug for HOI detection.
Experiments show that our method brings up to 3.3 mAP and 1.6 mAP improvements on V-COCO and HICODET dataset.
arXiv Detail & Related papers (2020-10-02T13:59:05Z) - ConsNet: Learning Consistency Graph for Zero-Shot Human-Object
Interaction Detection [101.56529337489417]
We consider the problem of Human-Object Interaction (HOI) Detection, which aims to locate and recognize HOI instances in the form of human, action, object> in images.
We argue that multi-level consistencies among objects, actions and interactions are strong cues for generating semantic representations of rare or previously unseen HOIs.
Our model takes visual features of candidate human-object pairs and word embeddings of HOI labels as inputs, maps them into visual-semantic joint embedding space and obtains detection results by measuring their similarities.
arXiv Detail & Related papers (2020-08-14T09:11:18Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z) - EHSOD: CAM-Guided End-to-end Hybrid-Supervised Object Detection with
Cascade Refinement [53.69674636044927]
We present EHSOD, an end-to-end hybrid-supervised object detection system.
It can be trained in one shot on both fully and weakly-annotated data.
It achieves comparable results on multiple object detection benchmarks with only 30% fully-annotated data.
arXiv Detail & Related papers (2020-02-18T08:04:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.