Relational Context Learning for Human-Object Interaction Detection
- URL: http://arxiv.org/abs/2304.04997v1
- Date: Tue, 11 Apr 2023 06:01:10 GMT
- Title: Relational Context Learning for Human-Object Interaction Detection
- Authors: Sanghyun Kim, Deunsol Jung, Minsu Cho
- Abstract summary: We propose the multiplex relation network (MUREN) that performs rich context exchange between three decoder branches.
The proposed method learns comprehensive relational contexts for discovering HOI instances.
- Score: 34.319471023763384
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent state-of-the-art methods for HOI detection typically build on
transformer architectures with two decoder branches, one for human-object pair
detection and the other for interaction classification. Such disentangled
transformers, however, may suffer from insufficient context exchange between
the branches and lead to a lack of context information for relational
reasoning, which is critical in discovering HOI instances. In this work, we
propose the multiplex relation network (MUREN) that performs rich context
exchange between three decoder branches using unary, pairwise, and ternary
relations of human, object, and interaction tokens. The proposed method learns
comprehensive relational contexts for discovering HOI instances, achieving
state-of-the-art performance on two standard benchmarks for HOI detection,
HICO-DET and V-COCO.
Related papers
- Two in One Go: Single-stage Emotion Recognition with Decoupled Subject-context Transformer [78.35816158511523]
We present a single-stage emotion recognition approach, employing a Decoupled Subject-Context Transformer (DSCT) for simultaneous subject localization and emotion classification.
We evaluate our single-stage framework on two widely used context-aware emotion recognition datasets, CAER-S and EMOTIC.
arXiv Detail & Related papers (2024-04-26T07:30:32Z) - Neural-Logic Human-Object Interaction Detection [67.4993347702353]
We present L OGIC HOI, a new HOI detector that leverages neural-logic reasoning and Transformer to infer feasible interactions between entities.
Specifically, we modify the self-attention mechanism in vanilla Transformer, enabling it to reason over the human, action, object> triplet and constitute novel interactions.
We formulate these two properties in first-order logic and ground them into continuous space to constrain the learning process of our approach, leading to improved performance and zero-shot generalization capabilities.
arXiv Detail & Related papers (2023-11-16T11:47:53Z) - Parallel Reasoning Network for Human-Object Interaction Detection [53.422076419484945]
We propose a new transformer-based method named Parallel Reasoning Network(PR-Net)
PR-Net constructs two independent predictors for instance-level localization and relation-level understanding.
Our PR-Net has achieved competitive results on HICO-DET and V-COCO benchmarks.
arXiv Detail & Related papers (2023-01-09T17:00:34Z) - Knowledge Guided Bidirectional Attention Network for Human-Object
Interaction Detection [3.0915392100355192]
We argue that the independent use of the bottom-up parsing strategy in HOI is counter-intuitive and could lead to the diffusion of attention.
We introduce a novel knowledge-guided top-down attention into HOI, and propose to model the relation parsing as a "look and search" process.
We implement the process via unifying the bottom-up and top-down attention in a single encoder-decoder based model.
arXiv Detail & Related papers (2022-07-16T16:42:49Z) - Human-Object Interaction Detection via Disentangled Transformer [63.46358684341105]
We present Disentangled Transformer, where both encoder and decoder are disentangled to facilitate learning of two sub-tasks.
Our method outperforms prior work on two public HOI benchmarks by a sizeable margin.
arXiv Detail & Related papers (2022-04-20T08:15:04Z) - GEN-VLKT: Simplify Association and Enhance Interaction Understanding for
HOI Detection [17.92210977820113]
We propose Guided-Embedding Network(GEN) to attain a two-branch pipeline without post-matching.
For the association, previous two-branch methods suffer from complex and costly post-matching.
For the interaction understanding, previous methods suffer from long-tailed distribution and zero-shot discovery.
arXiv Detail & Related papers (2022-03-26T01:04:13Z) - RR-Net: Injecting Interactive Semantics in Human-Object Interaction
Detection [40.65483058890176]
Latest end-to-end HOI detectors are short of relation reasoning, which leads to inability to learn HOI-specific interactive semantics for predictions.
We first present a progressive Relation-aware Frame, which brings a new structure and parameter sharing pattern for interaction inference.
Based on modules above, we construct an end-to-end trainable framework named Relation Reasoning Network (abbr. RR-Net)
arXiv Detail & Related papers (2021-04-30T14:03:10Z) - DCR-Net: A Deep Co-Interactive Relation Network for Joint Dialog Act
Recognition and Sentiment Classification [77.59549450705384]
In dialog system, dialog act recognition and sentiment classification are two correlative tasks.
Most of the existing systems either treat them as separate tasks or just jointly model the two tasks.
We propose a Deep Co-Interactive Relation Network (DCR-Net) to explicitly consider the cross-impact and model the interaction between the two tasks.
arXiv Detail & Related papers (2020-08-16T14:13:32Z) - Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.