The Overlooked Classifier in Human-Object Interaction Recognition
- URL: http://arxiv.org/abs/2203.05676v1
- Date: Thu, 10 Mar 2022 23:35:00 GMT
- Title: The Overlooked Classifier in Human-Object Interaction Recognition
- Authors: Ying Jin, Yinpeng Chen, Lijuan Wang, Jianfeng Wang, Pei Yu, Lin Liang,
Jenq-Neng Hwang, Zicheng Liu
- Abstract summary: We encode the semantic correlation among classes into the classification head by initializing the weights with language embeddings of HOIs.
We propose a new loss named LSE-Sign to enhance multi-label learning on a long-tailed dataset.
Our simple yet effective method enables detection-free HOI classification, outperforming the state-of-the-arts that require object detection and human pose by a clear margin.
- Score: 82.20671129356037
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human-Object Interaction (HOI) recognition is challenging due to two factors:
(1) significant imbalance across classes and (2) requiring multiple labels per
image. This paper shows that these two challenges can be effectively addressed
by improving the classifier with the backbone architecture untouched. Firstly,
we encode the semantic correlation among classes into the classification head
by initializing the weights with language embeddings of HOIs. As a result, the
performance is boosted significantly, especially for the few-shot subset.
Secondly, we propose a new loss named LSE-Sign to enhance multi-label learning
on a long-tailed dataset. Our simple yet effective method enables
detection-free HOI classification, outperforming the state-of-the-arts that
require object detection and human pose by a clear margin. Moreover, we
transfer the classification model to instance-level HOI detection by connecting
it with an off-the-shelf object detector. We achieve state-of-the-art without
additional fine-tuning.
Related papers
- Efficient Human-Object-Interaction (EHOI) Detection via Interaction Label Coding and Conditional Decision [33.59153869330463]
An Efficient HOI (EHOI) detector is proposed in this work to strike a good balance between detection performance, inference complexity, and mathematical transparency.
Our contributions include the application of error correction codes (ECCs) to encode rare interaction cases.
Experimental results demonstrate the advantages of ECC-coded interaction labels and the excellent balance of detection performance and complexity of the proposed EHOI method.
arXiv Detail & Related papers (2024-08-13T16:34:06Z) - Semantic Representation and Dependency Learning for Multi-Label Image
Recognition [76.52120002993728]
We propose a novel and effective semantic representation and dependency learning (SRDL) framework to learn category-specific semantic representation for each category.
Specifically, we design a category-specific attentional regions (CAR) module to generate channel/spatial-wise attention matrices to guide model.
We also design an object erasing (OE) module to implicitly learn semantic dependency among categories by erasing semantic-aware regions.
arXiv Detail & Related papers (2022-04-08T00:55:15Z) - Contrastive Object Detection Using Knowledge Graph Embeddings [72.17159795485915]
We compare the error statistics of the class embeddings learned from a one-hot approach with semantically structured embeddings from natural language processing or knowledge graphs.
We propose a knowledge-embedded design for keypoint-based and transformer-based object detection architectures.
arXiv Detail & Related papers (2021-12-21T17:10:21Z) - Decoupling Object Detection from Human-Object Interaction Recognition [37.133695677465376]
DEFR is a DEtection-FRee method to recognize Human-Object Interactions (HOI) at image level without using object location or human pose.
We propose two findings to boost the performance of the detection-free approach, which significantly outperforms the detection-assisted state of the arts.
arXiv Detail & Related papers (2021-12-13T03:01:49Z) - Learning to Detect Instance-level Salient Objects Using Complementary
Image Labels [55.049347205603304]
We present the first weakly-supervised approach to the salient instance detection problem.
We propose a novel weakly-supervised network with three branches: a Saliency Detection Branch leveraging class consistency information to locate candidate objects; a Boundary Detection Branch exploiting class discrepancy information to delineate object boundaries; and a Centroid Detection Branch using subitizing information to detect salient instance centroids.
arXiv Detail & Related papers (2021-11-19T10:15:22Z) - GAN for Vision, KG for Relation: a Two-stage Deep Network for Zero-shot
Action Recognition [33.23662792742078]
We propose a two-stage deep neural network for zero-shot action recognition.
In the sampling stage, we utilize a generative adversarial networks (GAN) trained by action features and word vectors of seen classes.
In the classification stage, we construct a knowledge graph based on the relationship between word vectors of action classes and related objects.
arXiv Detail & Related papers (2021-05-25T09:34:42Z) - Robust and Accurate Object Detection via Adversarial Learning [111.36192453882195]
This work augments the fine-tuning stage for object detectors by exploring adversarial examples.
Our approach boosts the performance of state-of-the-art EfficientDets by +1.1 mAP on the object detection benchmark.
arXiv Detail & Related papers (2021-03-23T19:45:26Z) - Modulating Localization and Classification for Harmonized Object
Detection [40.82723262074911]
We propose a mutual learning framework to modulate the two tasks.
In particular, the two tasks are forced to learn from each other with a novel mutual labeling strategy.
We achieve a significant performance gain over the baseline detectors on the COCO dataset.
arXiv Detail & Related papers (2021-03-16T10:36:02Z) - One-Shot Object Detection without Fine-Tuning [62.39210447209698]
We introduce a two-stage model consisting of a first stage Matching-FCOS network and a second stage Structure-Aware Relation Module.
We also propose novel training strategies that effectively improve detection performance.
Our method exceeds the state-of-the-art one-shot performance consistently on multiple datasets.
arXiv Detail & Related papers (2020-05-08T01:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.