Neural-Logic Human-Object Interaction Detection
- URL: http://arxiv.org/abs/2311.09817v1
- Date: Thu, 16 Nov 2023 11:47:53 GMT
- Title: Neural-Logic Human-Object Interaction Detection
- Authors: Liulei Li, Jianan Wei, Wenguan Wang, Yi Yang
- Abstract summary: We present L OGIC HOI, a new HOI detector that leverages neural-logic reasoning and Transformer to infer feasible interactions between entities.
Specifically, we modify the self-attention mechanism in vanilla Transformer, enabling it to reason over the human, action, object> triplet and constitute novel interactions.
We formulate these two properties in first-order logic and ground them into continuous space to constrain the learning process of our approach, leading to improved performance and zero-shot generalization capabilities.
- Score: 67.4993347702353
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The interaction decoder utilized in prevalent Transformer-based HOI detectors
typically accepts pre-composed human-object pairs as inputs. Though achieving
remarkable performance, such paradigm lacks feasibility and cannot explore
novel combinations over entities during decoding. We present L OGIC HOI, a new
HOI detector that leverages neural-logic reasoning and Transformer to infer
feasible interactions between entities. Specifically, we modify the
self-attention mechanism in vanilla Transformer, enabling it to reason over the
<human, action, object> triplet and constitute novel interactions. Meanwhile,
such reasoning process is guided by two crucial properties for understanding
HOI: affordances (the potential actions an object can facilitate) and proxemics
(the spatial relations between humans and objects). We formulate these two
properties in first-order logic and ground them into continuous space to
constrain the learning process of our approach, leading to improved performance
and zero-shot generalization capabilities. We evaluate L OGIC HOI on V-COCO and
HICO-DET under both normal and zero-shot setups, achieving significant
improvements over existing methods.
Related papers
- Disentangled Interaction Representation for One-Stage Human-Object
Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z) - HODN: Disentangling Human-Object Feature for HOI Detection [51.48164941412871]
We propose a Human and Object Disentangling Network (HODN) to model the Human-Object Interaction (HOI) relationships explicitly.
Considering that human features are more contributive to interaction, we propose a Human-Guide Linking method to make sure the interaction decoder focuses on the human-centric regions.
Our proposed method achieves competitive performance on both the V-COCO and the HICO-Det Linking datasets.
arXiv Detail & Related papers (2023-08-20T04:12:50Z) - ViPLO: Vision Transformer based Pose-Conditioned Self-Loop Graph for
Human-Object Interaction Detection [20.983998911754792]
Two-stage Human-Object Interaction (HOI) detectors suffer from lower performance than one-stage methods.
We propose Vision Transformer based Pose-Conditioned Self-Loop Graph (ViPLO) to resolve these problems.
ViPLO achieves the state-of-the-art results on two public benchmarks.
arXiv Detail & Related papers (2023-04-17T09:44:54Z) - Exploring Structure-aware Transformer over Interaction Proposals for
Human-Object Interaction Detection [119.93025368028083]
We design a novel Transformer-style Human-Object Interaction (HOI) detector, i.e., Structure-aware Transformer over Interaction Proposals (STIP)
STIP decomposes the process of HOI set prediction into two subsequent phases, i.e., an interaction proposal generation is first performed, and then followed by transforming the non-parametric interaction proposals into HOI predictions via a structure-aware Transformer.
The structure-aware Transformer upgrades vanilla Transformer by encoding additionally the holistically semantic structure among interaction proposals as well as the locally spatial structure of human/object within each interaction proposal, so as to strengthen HOI
arXiv Detail & Related papers (2022-06-13T16:21:08Z) - Human-Object Interaction Detection via Disentangled Transformer [63.46358684341105]
We present Disentangled Transformer, where both encoder and decoder are disentangled to facilitate learning of two sub-tasks.
Our method outperforms prior work on two public HOI benchmarks by a sizeable margin.
arXiv Detail & Related papers (2022-04-20T08:15:04Z) - RR-Net: Injecting Interactive Semantics in Human-Object Interaction
Detection [40.65483058890176]
Latest end-to-end HOI detectors are short of relation reasoning, which leads to inability to learn HOI-specific interactive semantics for predictions.
We first present a progressive Relation-aware Frame, which brings a new structure and parameter sharing pattern for interaction inference.
Based on modules above, we construct an end-to-end trainable framework named Relation Reasoning Network (abbr. RR-Net)
arXiv Detail & Related papers (2021-04-30T14:03:10Z) - Learning Human-Object Interaction Detection using Interaction Points [140.0200950601552]
We propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs.
Our network predicts interaction points, which directly localize and classify the inter-action.
Experiments are performed on two popular benchmarks: V-COCO and HICO-DET.
arXiv Detail & Related papers (2020-03-31T08:42:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.