Spatial Priming for Detecting Human-Object Interactions
- URL: http://arxiv.org/abs/2004.04851v1
- Date: Thu, 9 Apr 2020 23:20:30 GMT
- Title: Spatial Priming for Detecting Human-Object Interactions
- Authors: Ankan Bansal, Sai Saketh Rambhatla, Abhinav Shrivastava, Rama
Chellappa
- Abstract summary: We present a method for exploiting spatial layout information for detecting human-object interactions (HOIs) in images.
The proposed method consists of a layout module which primes a visual module to predict the type of interaction between a human and an object.
The proposed model reaches an mAP of 24.79% for HICO-Det dataset which is about 2.8% absolute points higher than the current state-of-the-art.
- Score: 89.22921959224396
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The relative spatial layout of a human and an object is an important cue for
determining how they interact. However, until now, spatial layout has been used
just as side-information for detecting human-object interactions (HOIs). In
this paper, we present a method for exploiting this spatial layout information
for detecting HOIs in images. The proposed method consists of a layout module
which primes a visual module to predict the type of interaction between a human
and an object. The visual and layout modules share information through lateral
connections at several stages. The model uses predictions from the layout
module as a prior to the visual module and the prediction from the visual
module is given as the final output. It also incorporates semantic information
about the object using word2vec vectors. The proposed model reaches an mAP of
24.79% for HICO-Det dataset which is about 2.8% absolute points higher than the
current state-of-the-art.
Related papers
- A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap [50.079224604394]
We present a novel model-agnostic framework called textbfContext-textbfEnhanced textbfFeature textbfAment (CEFA)
CEFA consists of a feature alignment module and a context enhancement module.
Our method can serve as a plug-and-play module to improve the detection performance of HOI models on rare categories.
arXiv Detail & Related papers (2024-07-31T08:42:48Z) - Exploring Robust Features for Few-Shot Object Detection in Satellite
Imagery [17.156864650143678]
We develop a few-shot object detector based on a traditional two-stage architecture.
A large-scale pre-trained model is used to build class-reference embeddings or prototypes.
We perform evaluations on two remote sensing datasets containing challenging and rare objects.
arXiv Detail & Related papers (2024-03-08T15:20:27Z) - Unified Visual Relationship Detection with Vision and Language Models [89.77838890788638]
This work focuses on training a single visual relationship detector predicting over the union of label spaces from multiple datasets.
We propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models.
Empirical results on both human-object interaction detection and scene-graph generation demonstrate the competitive performance of our model.
arXiv Detail & Related papers (2023-03-16T00:06:28Z) - Detecting Human-Object Interactions with Object-Guided Cross-Modal
Calibrated Semantics [6.678312249123534]
We aim to boost end-to-end models with object-guided statistical priors.
We propose to utilize a Verb Semantic Model (VSM) and use semantic aggregation to profit from this object-guided hierarchy.
The above modules combined composes Object-guided Cross-modal Network (OCN)
arXiv Detail & Related papers (2022-02-01T07:39:04Z) - Exploiting Scene Graphs for Human-Object Interaction Detection [81.49184987430333]
Human-Object Interaction (HOI) detection is a fundamental visual task aiming at localizing and recognizing interactions between humans and objects.
We propose a novel method to exploit this information, through the scene graph, for the Human-Object Interaction (SG2HOI) detection task.
Our method, SG2HOI, incorporates the SG information in two ways: (1) we embed a scene graph into a global context clue, serving as the scene-specific environmental context; and (2) we build a relation-aware message-passing module to gather relationships from objects' neighborhood and transfer them into interactions.
arXiv Detail & Related papers (2021-08-19T09:40:50Z) - Intrinsic Relationship Reasoning for Small Object Detection [44.68289739449486]
Small objects in images and videos are usually not independent individuals. Instead, they more or less present some semantic and spatial layout relationships with each other.
We propose a novel context reasoning approach for small object detection which models and infers the intrinsic semantic and spatial layout relationships between objects.
arXiv Detail & Related papers (2020-09-02T06:03:05Z) - Pose-based Modular Network for Human-Object Interaction Detection [5.6397911482914385]
We contribute a Pose-based Modular Network (PMN) which explores the absolute pose features and relative spatial pose features to improve HOI detection.
To evaluate our proposed method, we combine the module with the state-of-the-art model named VS-GATs and obtain significant improvement on two public benchmarks.
arXiv Detail & Related papers (2020-08-05T10:56:09Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z) - Attention-based Joint Detection of Object and Semantic Part [4.389917490809522]
Our model is created on top of two Faster-RCNN models that share their features to get enhanced representations of both.
Experiments on the PASCAL-Part 2010 dataset show that joint detection can simultaneously improve both object detection and part detection.
arXiv Detail & Related papers (2020-07-05T18:54:10Z) - Learning Human-Object Interaction Detection using Interaction Points [140.0200950601552]
We propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs.
Our network predicts interaction points, which directly localize and classify the inter-action.
Experiments are performed on two popular benchmarks: V-COCO and HICO-DET.
arXiv Detail & Related papers (2020-03-31T08:42:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.