Affordance Transfer Learning for Human-Object Interaction Detection
- URL: http://arxiv.org/abs/2104.02867v1
- Date: Wed, 7 Apr 2021 02:37:04 GMT
- Title: Affordance Transfer Learning for Human-Object Interaction Detection
- Authors: Zhi Hou, Baosheng Yu, Yu Qiao, Xiaojiang Peng, Dacheng Tao
- Abstract summary: We introduce an affordance transfer learning approach to jointly detect HOIs with novel objects and recognize affordances.
Specifically, HOI representations can be decoupled into a combination of affordance and object representations.
With the proposed affordance transfer learning, the model is also capable of inferring the affordances of novel objects from known affordance representations.
- Score: 106.37536031160282
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reasoning the human-object interactions (HOI) is essential for deeper scene
understanding, while object affordances (or functionalities) are of great
importance for human to discover unseen HOIs with novel objects. Inspired by
this, we introduce an affordance transfer learning approach to jointly detect
HOIs with novel objects and recognize affordances. Specifically, HOI
representations can be decoupled into a combination of affordance and object
representations, making it possible to compose novel interactions by combining
affordance representations and novel object representations from additional
images, i.e. transferring the affordance to novel objects. With the proposed
affordance transfer learning, the model is also capable of inferring the
affordances of novel objects from known affordance representations. The
proposed method can thus be used to 1) improve the performance of HOI
detection, especially for the HOIs with unseen objects; and 2) infer the
affordances of novel objects. Experimental results on two datasets, HICO-DET
and HOI-COCO (from V-COCO), demonstrate significant improvements over recent
state-of-the-art methods for HOI detection and object affordance detection.
Code is available at https://github.com/zhihou7/HOI-CL
Related papers
- Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - Disentangled Interaction Representation for One-Stage Human-Object
Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z) - Exploring Predicate Visual Context in Detecting Human-Object
Interactions [44.937383506126274]
We study how best to re-introduce image features via cross-attention.
Our model with enhanced predicate visual context (PViC) outperforms state-of-the-art methods on the HICO-DET and V-COCO benchmarks.
arXiv Detail & Related papers (2023-08-11T15:57:45Z) - Compositional Learning in Transformer-Based Human-Object Interaction
Detection [6.630793383852106]
Long-tailed distribution of labeled instances is a primary challenge in HOI detection.
Inspired by the nature of HOI triplets, some existing approaches adopt the idea of compositional learning.
We creatively propose a transformer-based framework for compositional HOI learning.
arXiv Detail & Related papers (2023-08-11T06:41:20Z) - Spatial Reasoning for Few-Shot Object Detection [21.3564383157159]
We propose a spatial reasoning framework that detects novel objects with only a few training examples in a context.
We employ a graph convolutional network as the RoIs and their relatedness are defined as nodes and edges, respectively.
We demonstrate that the proposed method significantly outperforms the state-of-the-art methods and verify its efficacy through extensive ablation studies.
arXiv Detail & Related papers (2022-11-02T12:38:08Z) - Exploiting Scene Graphs for Human-Object Interaction Detection [81.49184987430333]
Human-Object Interaction (HOI) detection is a fundamental visual task aiming at localizing and recognizing interactions between humans and objects.
We propose a novel method to exploit this information, through the scene graph, for the Human-Object Interaction (SG2HOI) detection task.
Our method, SG2HOI, incorporates the SG information in two ways: (1) we embed a scene graph into a global context clue, serving as the scene-specific environmental context; and (2) we build a relation-aware message-passing module to gather relationships from objects' neighborhood and transfer them into interactions.
arXiv Detail & Related papers (2021-08-19T09:40:50Z) - Detecting Human-Object Interaction via Fabricated Compositional Learning [106.37536031160282]
Human-Object Interaction (HOI) detection is a fundamental task for high-level scene understanding.
Human has extremely powerful compositional perception ability to cognize rare or unseen HOI samples.
We propose Fabricated Compositional Learning (FCL) to address the problem of open long-tailed HOI detection.
arXiv Detail & Related papers (2021-03-15T08:52:56Z) - Visual Compositional Learning for Human-Object Interaction Detection [111.05263071111807]
Human-Object interaction (HOI) detection aims to localize and infer relationships between human and objects in an image.
It is challenging because an enormous number of possible combinations of objects and verbs types forms a long-tail distribution.
We devise a deep Visual Compositional Learning framework, which is a simple yet efficient framework to effectively address this problem.
arXiv Detail & Related papers (2020-07-24T08:37:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.