Category-Aware Transformer Network for Better Human-Object Interaction
Detection
- URL: http://arxiv.org/abs/2204.04911v1
- Date: Mon, 11 Apr 2022 07:21:24 GMT
- Title: Category-Aware Transformer Network for Better Human-Object Interaction
Detection
- Authors: Leizhen Dong, Zhimin Li, Kunlun Xu, Zhijun Zhang, Luxin Yan, Sheng
Zhong, Xu Zou
- Abstract summary: We study the issue of promoting transformer-based HOI detectors by initializing the Object Query with category-aware semantic information.
Specifically, the Object Query would be represented via category priors represented by an external object detection model to yield better performance.
A HOI detection model equipped with our idea outperforms the baseline by a large margin to achieve a new state-of-the-art result.
- Score: 20.857034771924997
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human-Object Interactions (HOI) detection, which aims to localize a human and
a relevant object while recognizing their interaction, is crucial for
understanding a still image. Recently, transformer-based models have
significantly advanced the progress of HOI detection. However, the capability
of these models has not been fully explored since the Object Query of the model
is always simply initialized as just zeros, which would affect the performance.
In this paper, we try to study the issue of promoting transformer-based HOI
detectors by initializing the Object Query with category-aware semantic
information. To this end, we innovatively propose the Category-Aware
Transformer Network (CATN). Specifically, the Object Query would be initialized
via category priors represented by an external object detection model to yield
better performance. Moreover, such category priors can be further used for
enhancing the representation ability of features via the attention mechanism.
We have firstly verified our idea via the Oracle experiment by initializing the
Object Query with the groundtruth category information. And then extensive
experiments have been conducted to show that a HOI detection model equipped
with our idea outperforms the baseline by a large margin to achieve a new
state-of-the-art result.
Related papers
- Dynamic Object Queries for Transformer-based Incremental Object Detection [45.41291377837515]
Incremental object detection aims to sequentially learn new classes, while maintaining the capability to locate and identify old ones.
Prior methodologies mainly tackle the forgetting issue through knowledge distillation and exemplar replay.
We propose DyQ-DETR, which incrementally expands the model representation ability to achieve stability-plasticity tradeoffs.
arXiv Detail & Related papers (2024-07-31T15:29:34Z) - Geometric Features Enhanced Human-Object Interaction Detection [11.513009304308724]
We propose a novel end-to-end Transformer-style HOI detection model, i.e., geometric features enhanced HOI detector (GeoHOI)
One key part of the model is a new unified self-supervised keypoint learning method named UniPointNet.
GeoHOI effectively upgrades a Transformer-based HOI detector benefiting from the keypoints similarities measuring the likelihood of human-object interactions.
arXiv Detail & Related papers (2024-06-26T18:52:53Z) - Zero-shot Degree of Ill-posedness Estimation for Active Small Object Change Detection [8.977792536037956]
In everyday indoor navigation, robots often needto detect non-distinctive small-change objects.
Existing techniques rely on high-quality class-specific object priors to regularize a change detector model.
In this study, we explore the concept of degree-of-ill-posedness (DoI) to improve both passive and activevision.
arXiv Detail & Related papers (2024-05-10T01:56:39Z) - Relational Prior Knowledge Graphs for Detection and Instance
Segmentation [24.360473253478112]
We propose a graph that enhances object features using priors.
Experimental evaluations on COCO show that the utilization of scene graphs, augmented with relational priors, offer benefits for object detection and instance segmentation.
arXiv Detail & Related papers (2023-10-11T15:15:05Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric
Action Recognition [35.4163266882568]
We introduce Self-Supervised Learning Over Sets (SOS) to pre-train a generic Objects In Contact (OIC) representation model.
Our OIC significantly boosts the performance of multiple state-of-the-art video classification models.
arXiv Detail & Related papers (2022-04-10T23:27:19Z) - Suspected Object Matters: Rethinking Model's Prediction for One-stage
Visual Grounding [93.82542533426766]
We propose a Suspected Object Transformation mechanism (SOT) to encourage the target object selection among the suspected ones.
SOT can be seamlessly integrated into existing CNN and Transformer-based one-stage visual grounders.
Extensive experiments demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2022-03-10T06:41:07Z) - Scale-aware Automatic Augmentation for Object Detection [63.087930708444695]
We propose Scale-aware AutoAug to learn data augmentation policies for object detection.
In experiments, Scale-aware AutoAug yields significant and consistent improvement on various object detectors.
arXiv Detail & Related papers (2021-03-31T17:11:14Z) - Robust and Accurate Object Detection via Adversarial Learning [111.36192453882195]
This work augments the fine-tuning stage for object detectors by exploring adversarial examples.
Our approach boosts the performance of state-of-the-art EfficientDets by +1.1 mAP on the object detection benchmark.
arXiv Detail & Related papers (2021-03-23T19:45:26Z) - Novel Human-Object Interaction Detection via Adversarial Domain
Generalization [103.55143362926388]
We study the problem of novel human-object interaction (HOI) detection, aiming at improving the generalization ability of the model to unseen scenarios.
The challenge mainly stems from the large compositional space of objects and predicates, which leads to the lack of sufficient training data for all the object-predicate combinations.
We propose a unified framework of adversarial domain generalization to learn object-invariant features for predicate prediction.
arXiv Detail & Related papers (2020-05-22T22:02:56Z) - Look-into-Object: Self-supervised Structure Modeling for Object
Recognition [71.68524003173219]
We propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions.
We show the recognition backbone can be substantially enhanced for more robust representation learning.
Our approach achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft)
arXiv Detail & Related papers (2020-03-31T12:22:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.