Detecting Human-to-Human-or-Object (H2O) Interactions with DIABOLO
- URL: http://arxiv.org/abs/2201.02396v1
- Date: Fri, 7 Jan 2022 11:00:11 GMT
- Title: Detecting Human-to-Human-or-Object (H2O) Interactions with DIABOLO
- Authors: Astrid Orcesi, Romaric Audigier, Fritz Poka Toukam and Bertrand
Luvison
- Abstract summary: We propose a new interaction dataset to deal with both types of human interactions: Human-to-Human-or-Object (H2O)
In addition, we introduce a novel taxonomy of verbs, intended to be closer to a description of human body attitude in relation to the surrounding targets of interaction.
We propose DIABOLO, an efficient subject-centric single-shot method to detect all interactions in one forward pass.
- Score: 29.0200561485714
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Detecting human interactions is crucial for human behavior analysis. Many
methods have been proposed to deal with Human-to-Object Interaction (HOI)
detection, i.e., detecting in an image which person and object interact
together and classifying the type of interaction. However, Human-to-Human
Interactions, such as social and violent interactions, are generally not
considered in available HOI training datasets. As we think these types of
interactions cannot be ignored and decorrelated from HOI when analyzing human
behavior, we propose a new interaction dataset to deal with both types of human
interactions: Human-to-Human-or-Object (H2O). In addition, we introduce a novel
taxonomy of verbs, intended to be closer to a description of human body
attitude in relation to the surrounding targets of interaction, and more
independent of the environment. Unlike some existing datasets, we strive to
avoid defining synonymous verbs when their use highly depends on the target
type or requires a high level of semantic interpretation. As H2O dataset
includes V-COCO images annotated with this new taxonomy, images obviously
contain more interactions. This can be an issue for HOI detection methods whose
complexity depends on the number of people, targets or interactions. Thus, we
propose DIABOLO (Detecting InterActions By Only Looking Once), an efficient
subject-centric single-shot method to detect all interactions in one forward
pass, with constant inference time independent of image content. In addition,
this multi-task network simultaneously detects all people and objects. We show
how sharing a network for these tasks does not only save computation resource
but also improves performance collaboratively. Finally, DIABOLO is a strong
baseline for the new proposed challenge of H2O Interaction detection, as it
outperforms all state-of-the-art methods when trained and evaluated on HOI
dataset V-COCO.
Related papers
- Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - Disentangled Interaction Representation for One-Stage Human-Object
Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z) - HODN: Disentangling Human-Object Feature for HOI Detection [51.48164941412871]
We propose a Human and Object Disentangling Network (HODN) to model the Human-Object Interaction (HOI) relationships explicitly.
Considering that human features are more contributive to interaction, we propose a Human-Guide Linking method to make sure the interaction decoder focuses on the human-centric regions.
Our proposed method achieves competitive performance on both the V-COCO and the HICO-Det Linking datasets.
arXiv Detail & Related papers (2023-08-20T04:12:50Z) - Exploiting Scene Graphs for Human-Object Interaction Detection [81.49184987430333]
Human-Object Interaction (HOI) detection is a fundamental visual task aiming at localizing and recognizing interactions between humans and objects.
We propose a novel method to exploit this information, through the scene graph, for the Human-Object Interaction (SG2HOI) detection task.
Our method, SG2HOI, incorporates the SG information in two ways: (1) we embed a scene graph into a global context clue, serving as the scene-specific environmental context; and (2) we build a relation-aware message-passing module to gather relationships from objects' neighborhood and transfer them into interactions.
arXiv Detail & Related papers (2021-08-19T09:40:50Z) - Transferable Interactiveness Knowledge for Human-Object Interaction
Detection [46.89715038756862]
We explore interactiveness knowledge which indicates whether a human and an object interact with each other or not.
We found that interactiveness knowledge can be learned across HOI datasets and bridge the gap between diverse HOI category settings.
Our core idea is to exploit an interactiveness network to learn the general interactiveness knowledge from multiple HOI datasets.
arXiv Detail & Related papers (2021-01-25T18:21:07Z) - DRG: Dual Relation Graph for Human-Object Interaction Detection [65.50707710054141]
We tackle the challenging problem of human-object interaction (HOI) detection.
Existing methods either recognize the interaction of each human-object pair in isolation or perform joint inference based on complex appearance-based features.
In this paper, we leverage an abstract spatial-semantic representation to describe each human-object pair and aggregate the contextual information of the scene via a dual relation graph.
arXiv Detail & Related papers (2020-08-26T17:59:40Z) - Learning Human-Object Interaction Detection using Interaction Points [140.0200950601552]
We propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs.
Our network predicts interaction points, which directly localize and classify the inter-action.
Experiments are performed on two popular benchmarks: V-COCO and HICO-DET.
arXiv Detail & Related papers (2020-03-31T08:42:06Z) - Classifying All Interacting Pairs in a Single Shot [29.0200561485714]
We introduce a novel human interaction detection approach, based on CALIPSO, a classifier of human-object interactions.
It estimates interactions simultaneously for all human-object pairs, regardless of their number and class.
It leads to a constant complexity and computation time independent of the number of subjects, objects or interactions in the image.
arXiv Detail & Related papers (2020-01-13T15:51:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.