Effective Actor-centric Human-object Interaction Detection
        - URL: http://arxiv.org/abs/2202.11998v1
- Date: Thu, 24 Feb 2022 10:24:44 GMT
- Title: Effective Actor-centric Human-object Interaction Detection
- Authors: Kunlun Xu and Zhimin Li and Zhijun Zhang and Leizhen Dong and Wenhui
  Xu and Luxin Yan and Sheng Zhong and Xu Zou
- Abstract summary: We propose a novel actor-centric framework to detect Human-Object Interaction in images.
Our method achieves the state-of-the-art on the challenging V-COCO and HICO-DET benchmarks.
- Score: 20.564689533862524
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract:   While Human-Object Interaction(HOI) Detection has achieved tremendous
advances in recent, it still remains challenging due to complex interactions
with multiple humans and objects occurring in images, which would inevitably
lead to ambiguities. Most existing methods either generate all human-object
pair candidates and infer their relationships by cropped local features
successively in a two-stage manner, or directly predict interaction points in a
one-stage procedure. However, the lack of spatial configurations or reasoning
steps of two- or one- stage methods respectively limits their performance in
such complex scenes. To avoid this ambiguity, we propose a novel actor-centric
framework. The main ideas are that when inferring interactions: 1) the
non-local features of the entire image guided by actor position are obtained to
model the relationship between the actor and context, and then 2) we use an
object branch to generate pixel-wise interaction area prediction, where the
interaction area denotes the object central area. Moreover, we also use an
actor branch to get interaction prediction of the actor and propose a novel
composition strategy based on center-point indexing to generate the final HOI
prediction. Thanks to the usage of the non-local features and the
partly-coupled property of the human-objects composition strategy, our proposed
framework can detect HOI more accurately especially for complex images.
Extensive experimental results show that our method achieves the
state-of-the-art on the challenging V-COCO and HICO-DET benchmarks and is more
robust especially in multiple persons and/or objects scenes.
 
      
        Related papers
        - HOComp: Interaction-Aware Human-Object Composition [62.93211305213214]
 HOComp is a novel approach for compositing a foreground object onto a human-centric background image.<n> Experimental results on our dataset show that HOComp effectively generates human-object interactions with consistent appearances.
 arXiv  Detail & Related papers  (2025-07-22T17:59:21Z)
- Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
 We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
 arXiv  Detail & Related papers  (2024-10-15T07:35:51Z)
- Disentangled Interaction Representation for One-Stage Human-Object
  Interaction Detection [70.96299509159981]
 Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
 arXiv  Detail & Related papers  (2023-12-04T08:02:59Z)
- HODN: Disentangling Human-Object Feature for HOI Detection [51.48164941412871]
 We propose a Human and Object Disentangling Network (HODN) to model the Human-Object Interaction (HOI) relationships explicitly.
Considering that human features are more contributive to interaction, we propose a Human-Guide Linking method to make sure the interaction decoder focuses on the human-centric regions.
Our proposed method achieves competitive performance on both the V-COCO and the HICO-Det Linking datasets.
 arXiv  Detail & Related papers  (2023-08-20T04:12:50Z)
- HOTR: End-to-End Human-Object Interaction Detection with Transformers [26.664864824357164]
 We present a novel framework, referred to by HOTR, which directly predicts a set of human, object, interaction> triplets from an image.
Our proposed algorithm achieves the state-of-the-art performance in two HOI detection benchmarks with an inference time under 1 ms after object detection.
 arXiv  Detail & Related papers  (2021-04-28T10:10:29Z)
- DRG: Dual Relation Graph for Human-Object Interaction Detection [65.50707710054141]
 We tackle the challenging problem of human-object interaction (HOI) detection.
Existing methods either recognize the interaction of each human-object pair in isolation or perform joint inference based on complex appearance-based features.
In this paper, we leverage an abstract spatial-semantic representation to describe each human-object pair and aggregate the contextual information of the scene via a dual relation graph.
 arXiv  Detail & Related papers  (2020-08-26T17:59:40Z)
- A Deep Learning Approach to Object Affordance Segmentation [31.221897360610114]
 We design an autoencoder that infers pixel-wise affordance labels in both videos and static images.
Our model surpasses the need for object labels and bounding boxes by using a soft-attention mechanism.
We show that our model achieves competitive results compared to strongly supervised methods on SOR3D-AFF.
 arXiv  Detail & Related papers  (2020-04-18T15:34:41Z)
- Learning Human-Object Interaction Detection using Interaction Points [140.0200950601552]
 We propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs.
Our network predicts interaction points, which directly localize and classify the inter-action.
Experiments are performed on two popular benchmarks: V-COCO and HICO-DET.
 arXiv  Detail & Related papers  (2020-03-31T08:42:06Z)
- Cascaded Human-Object Interaction Recognition [175.60439054047043]
 We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
 arXiv  Detail & Related papers  (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.