Detecting out-of-context objects using contextual cues
- URL: http://arxiv.org/abs/2202.05930v1
- Date: Fri, 11 Feb 2022 23:15:01 GMT
- Title: Detecting out-of-context objects using contextual cues
- Authors: Manoj Acharya, Anirban Roy, Kaushik Koneripalli, Susmit Jha,
Christopher Kanan, Ajay Divakaran
- Abstract summary: We propose a graph contextual reasoning network (GCRN) to detect out-of-context (OOC) objects in an image.
GCRN consists of two separate graphs to predict object labels based on the contextual cues in the image.
GCRN explicitly captures the contextual cues to improve the detection of in-context objects and identify objects that violate contextual relations.
- Score: 29.92843037720968
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents an approach to detect out-of-context (OOC) objects in an
image. Given an image with a set of objects, our goal is to determine if an
object is inconsistent with the scene context and detect the OOC object with a
bounding box. In this work, we consider commonly explored contextual relations
such as co-occurrence relations, the relative size of an object with respect to
other objects, and the position of the object in the scene. We posit that
contextual cues are useful to determine object labels for in-context objects
and inconsistent context cues are detrimental to determining object labels for
out-of-context objects. To realize this hypothesis, we propose a graph
contextual reasoning network (GCRN) to detect OOC objects. GCRN consists of two
separate graphs to predict object labels based on the contextual cues in the
image: 1) a representation graph to learn object features based on the
neighboring objects and 2) a context graph to explicitly capture contextual
cues from the neighboring objects. GCRN explicitly captures the contextual cues
to improve the detection of in-context objects and identify objects that
violate contextual relations. In order to evaluate our approach, we create a
large-scale dataset by adding OOC object instances to the COCO images. We also
evaluate on recent OCD benchmark. Our results show that GCRN outperforms
competitive baselines in detecting OOC objects and correctly detecting
in-context objects.
Related papers
- ContextHOI: Spatial Context Learning for Human-Object Interaction Detection [24.381821663963898]
spatial contexts are considered critical in Human-Object Interaction (HOI) recognition.
We present a dual-branch framework named ContextHOI, which efficiently captures both object detection features and spatial contexts.
ContextHOI achieves state-of-the-art performance on the HICO-DET and v-coco benchmarks.
arXiv Detail & Related papers (2024-12-12T08:21:19Z) - Which One? Leveraging Context Between Objects and Multiple Views for Language Grounding [77.26626173589746]
We present the Multi-view Approach to Grounding in Context (MAGiC)
It selects an object referent based on language that distinguishes between two similar objects.
It improves over the state-of-the-art model on the SNARE object reference task with a relative error reduction of 12.9%.
arXiv Detail & Related papers (2023-11-12T00:21:58Z) - Learning Object-Language Alignments for Open-Vocabulary Object Detection [83.09560814244524]
We propose a novel open-vocabulary object detection framework directly learning from image-text pair data.
It enables us to train an open-vocabulary object detector on image-text pairs in a much simple and effective way.
arXiv Detail & Related papers (2022-11-27T14:47:31Z) - Automatic dataset generation for specific object detection [6.346581421948067]
We present a method to synthesize object-in-scene images, which can preserve the objects' detailed features without bringing irrelevant information.
Our result shows that in the synthesized image, the boundaries of objects blend very well with the background.
arXiv Detail & Related papers (2022-07-16T07:44:33Z) - Complex Scene Image Editing by Scene Graph Comprehension [17.72638225034884]
We propose a two-stage method for achieving complex scene image editing by Scene Graph (SGC-Net)
In the first stage, we train a Region of Interest (RoI) prediction network that uses scene graphs and predict the locations of the target objects.
The second stage uses a conditional diffusion model to edit the image based on our RoI predictions.
arXiv Detail & Related papers (2022-03-24T05:12:54Z) - Relation Regularized Scene Graph Generation [206.76762860019065]
Scene graph generation (SGG) is built on top of detected objects to predict object pairwise visual relations.
We propose a relation regularized network (R2-Net) which can predict whether there is a relationship between two objects.
Our R2-Net can effectively refine object labels and generate scene graphs.
arXiv Detail & Related papers (2022-02-22T11:36:49Z) - Exploiting Scene Graphs for Human-Object Interaction Detection [81.49184987430333]
Human-Object Interaction (HOI) detection is a fundamental visual task aiming at localizing and recognizing interactions between humans and objects.
We propose a novel method to exploit this information, through the scene graph, for the Human-Object Interaction (SG2HOI) detection task.
Our method, SG2HOI, incorporates the SG information in two ways: (1) we embed a scene graph into a global context clue, serving as the scene-specific environmental context; and (2) we build a relation-aware message-passing module to gather relationships from objects' neighborhood and transfer them into interactions.
arXiv Detail & Related papers (2021-08-19T09:40:50Z) - Learning Object Detection from Captions via Textual Scene Attributes [70.90708863394902]
We argue that captions contain much richer information about the image, including attributes of objects and their relations.
We present a method that uses the attributes in this "textual scene graph" to train object detectors.
We empirically demonstrate that the resulting model achieves state-of-the-art results on several challenging object detection datasets.
arXiv Detail & Related papers (2020-09-30T10:59:20Z) - Expressing Objects just like Words: Recurrent Visual Embedding for
Image-Text Matching [102.62343739435289]
Existing image-text matching approaches infer the similarity of an image-text pair by capturing and aggregating the affinities between the text and each independent object of the image.
We propose a Dual Path Recurrent Neural Network (DP-RNN) which processes images and sentences symmetrically by recurrent neural networks (RNN)
Our model achieves the state-of-the-art performance on Flickr30K dataset and competitive performance on MS-COCO dataset.
arXiv Detail & Related papers (2020-02-20T00:51:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.