Related papers: Self-Selective Context for Interaction Recognition

Self-Selective Context for Interaction Recognition

URL: http://arxiv.org/abs/2010.08750v1
Date: Sat, 17 Oct 2020 09:06:12 GMT
Title: Self-Selective Context for Interaction Recognition
Authors: Mert Kilickaya, Noureldien Hussein, Efstratios Gavves, Arnold Smeulders
Abstract summary: We propose Self-Selective Context (SSC) for human-object interaction recognition. SSC operates on the joint appearance of human-objects and context to bring the most discriminative context into play for recognition. Our experiments show that SSC leads to an important increase in interaction recognition performance, while using much fewer parameters.
Score: 27.866495303658404
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Human-object interaction recognition aims for identifying the relationship between a human subject and an object. Researchers incorporate global scene context into the early layers of deep Convolutional Neural Networks as a solution. They report a significant increase in the performance since generally interactions are correlated with the scene (\ie riding bicycle on the city street). However, this approach leads to the following problems. It increases the network size in the early layers, therefore not efficient. It leads to noisy filter responses when the scene is irrelevant, therefore not accurate. It only leverages scene context whereas human-object interactions offer a multitude of contexts, therefore incomplete. To circumvent these issues, in this work, we propose Self-Selective Context (SSC). SSC operates on the joint appearance of human-objects and context to bring the most discriminative context(s) into play for recognition. We devise novel contextual features that model the locality of human-object interactions and show that SSC can seamlessly integrate with the State-of-the-art interaction recognition models. Our experiments show that SSC leads to an important increase in interaction recognition performance, while using much fewer parameters.

Related papers

Locality-Aware Zero-Shot Human-Object Interaction Detection [43.13461922628371]
We present LAIN, a novel zero-shot Human-Object Interaction (HOI) detection framework.<n>By infusing locality and interaction awareness into CLIP representation, LAIN captures detailed information about the human-object pairs.<n>Our experiments show that LAIN outperforms previous methods on various zero-shot settings.
arXiv Detail & Related papers (2025-05-26T04:31:34Z)
Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues. Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z)
Disentangled Interaction Representation for One-Stage Human-Object Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding. Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction. Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z)
Conversation Understanding using Relational Temporal Graph Neural Networks with Auxiliary Cross-Modality Interaction [2.1261712640167856]
Emotion recognition is a crucial task for human conversation understanding. We propose an input Temporal Graph Neural Network with Cross-Modality Interaction (CORECT) CORECT effectively captures conversation-level cross-modality interactions and utterance-level temporal dependencies.
arXiv Detail & Related papers (2023-11-08T07:46:25Z)
Modelling Spatio-Temporal Interactions for Compositional Action Recognition [21.8767024220287]
Humans have the natural ability to recognize actions even if the objects involved in the action or the background are changed. We show the effectiveness of our interaction-centric approach on the compositional Something-Else dataset. Our approach of explicit human-object-stuff interaction modeling is effective even for standard action recognition datasets.
arXiv Detail & Related papers (2023-05-04T09:37:45Z)
Effective Actor-centric Human-object Interaction Detection [20.564689533862524]
We propose a novel actor-centric framework to detect Human-Object Interaction in images. Our method achieves the state-of-the-art on the challenging V-COCO and HICO-DET benchmarks.
arXiv Detail & Related papers (2022-02-24T10:24:44Z)
Detecting Human-to-Human-or-Object (H2O) Interactions with DIABOLO [29.0200561485714]
We propose a new interaction dataset to deal with both types of human interactions: Human-to-Human-or-Object (H2O) In addition, we introduce a novel taxonomy of verbs, intended to be closer to a description of human body attitude in relation to the surrounding targets of interaction. We propose DIABOLO, an efficient subject-centric single-shot method to detect all interactions in one forward pass.
arXiv Detail & Related papers (2022-01-07T11:00:11Z)
Learning Asynchronous and Sparse Human-Object Interaction in Videos [56.73059840294019]
Asynchronous-Sparse Interaction Graph Networks (ASSIGN) is able to automatically detect the structure of interaction events associated with entities in a video scene. ASSIGN is tested on human-object interaction recognition and shows superior performance in segmenting and labeling of human sub-activities and object affordances from raw videos.
arXiv Detail & Related papers (2021-03-03T23:43:55Z)
Contextual Heterogeneous Graph Network for Human-Object Interaction Detection [63.37410475907447]
This work proposes a heterogeneous graph network that models humans and objects as different kinds of nodes. In addition, a graph attention mechanism based on the intra-class context and inter-class context is exploited to improve the learning.
arXiv Detail & Related papers (2020-10-20T04:20:33Z)
Learning Human-Object Interaction Detection using Interaction Points [140.0200950601552]
We propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs. Our network predicts interaction points, which directly localize and classify the inter-action. Experiments are performed on two popular benchmarks: V-COCO and HICO-DET.
arXiv Detail & Related papers (2020-03-31T08:42:06Z)
Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding. At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network. With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.