Self-Selective Context for Interaction Recognition
- URL: http://arxiv.org/abs/2010.08750v1
- Date: Sat, 17 Oct 2020 09:06:12 GMT
- Title: Self-Selective Context for Interaction Recognition
- Authors: Mert Kilickaya, Noureldien Hussein, Efstratios Gavves, Arnold
Smeulders
- Abstract summary: We propose Self-Selective Context (SSC) for human-object interaction recognition.
SSC operates on the joint appearance of human-objects and context to bring the most discriminative context into play for recognition.
Our experiments show that SSC leads to an important increase in interaction recognition performance, while using much fewer parameters.
- Score: 27.866495303658404
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human-object interaction recognition aims for identifying the relationship
between a human subject and an object. Researchers incorporate global scene
context into the early layers of deep Convolutional Neural Networks as a
solution. They report a significant increase in the performance since generally
interactions are correlated with the scene (\ie riding bicycle on the city
street). However, this approach leads to the following problems. It increases
the network size in the early layers, therefore not efficient. It leads to
noisy filter responses when the scene is irrelevant, therefore not accurate. It
only leverages scene context whereas human-object interactions offer a
multitude of contexts, therefore incomplete. To circumvent these issues, in
this work, we propose Self-Selective Context (SSC). SSC operates on the joint
appearance of human-objects and context to bring the most discriminative
context(s) into play for recognition. We devise novel contextual features that
model the locality of human-object interactions and show that SSC can
seamlessly integrate with the State-of-the-art interaction recognition models.
Our experiments show that SSC leads to an important increase in interaction
recognition performance, while using much fewer parameters.
Related papers
- Disentangled Interaction Representation for One-Stage Human-Object
Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z) - Conversation Understanding using Relational Temporal Graph Neural
Networks with Auxiliary Cross-Modality Interaction [2.1261712640167856]
Emotion recognition is a crucial task for human conversation understanding.
We propose an input Temporal Graph Neural Network with Cross-Modality Interaction (CORECT)
CORECT effectively captures conversation-level cross-modality interactions and utterance-level temporal dependencies.
arXiv Detail & Related papers (2023-11-08T07:46:25Z) - Modelling Spatio-Temporal Interactions for Compositional Action
Recognition [21.8767024220287]
Humans have the natural ability to recognize actions even if the objects involved in the action or the background are changed.
We show the effectiveness of our interaction-centric approach on the compositional Something-Else dataset.
Our approach of explicit human-object-stuff interaction modeling is effective even for standard action recognition datasets.
arXiv Detail & Related papers (2023-05-04T09:37:45Z) - Effective Actor-centric Human-object Interaction Detection [20.564689533862524]
We propose a novel actor-centric framework to detect Human-Object Interaction in images.
Our method achieves the state-of-the-art on the challenging V-COCO and HICO-DET benchmarks.
arXiv Detail & Related papers (2022-02-24T10:24:44Z) - Detecting Human-to-Human-or-Object (H2O) Interactions with DIABOLO [29.0200561485714]
We propose a new interaction dataset to deal with both types of human interactions: Human-to-Human-or-Object (H2O)
In addition, we introduce a novel taxonomy of verbs, intended to be closer to a description of human body attitude in relation to the surrounding targets of interaction.
We propose DIABOLO, an efficient subject-centric single-shot method to detect all interactions in one forward pass.
arXiv Detail & Related papers (2022-01-07T11:00:11Z) - Exploiting Scene Graphs for Human-Object Interaction Detection [81.49184987430333]
Human-Object Interaction (HOI) detection is a fundamental visual task aiming at localizing and recognizing interactions between humans and objects.
We propose a novel method to exploit this information, through the scene graph, for the Human-Object Interaction (SG2HOI) detection task.
Our method, SG2HOI, incorporates the SG information in two ways: (1) we embed a scene graph into a global context clue, serving as the scene-specific environmental context; and (2) we build a relation-aware message-passing module to gather relationships from objects' neighborhood and transfer them into interactions.
arXiv Detail & Related papers (2021-08-19T09:40:50Z) - Learning Asynchronous and Sparse Human-Object Interaction in Videos [56.73059840294019]
Asynchronous-Sparse Interaction Graph Networks (ASSIGN) is able to automatically detect the structure of interaction events associated with entities in a video scene.
ASSIGN is tested on human-object interaction recognition and shows superior performance in segmenting and labeling of human sub-activities and object affordances from raw videos.
arXiv Detail & Related papers (2021-03-03T23:43:55Z) - Contextual Heterogeneous Graph Network for Human-Object Interaction
Detection [63.37410475907447]
This work proposes a heterogeneous graph network that models humans and objects as different kinds of nodes.
In addition, a graph attention mechanism based on the intra-class context and inter-class context is exploited to improve the learning.
arXiv Detail & Related papers (2020-10-20T04:20:33Z) - Learning Human-Object Interaction Detection using Interaction Points [140.0200950601552]
We propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs.
Our network predicts interaction points, which directly localize and classify the inter-action.
Experiments are performed on two popular benchmarks: V-COCO and HICO-DET.
arXiv Detail & Related papers (2020-03-31T08:42:06Z) - Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.