Open-Vocabulary Object Detection via Scene Graph Discovery
- URL: http://arxiv.org/abs/2307.03339v1
- Date: Fri, 7 Jul 2023 00:46:19 GMT
- Title: Open-Vocabulary Object Detection via Scene Graph Discovery
- Authors: Hengcan Shi, Munawar Hayat, Jianfei Cai
- Abstract summary: Open-vocabulary (OV) object detection has attracted increasing research attention.
We propose a novel Scene-Graph-Based Discovery Network (SGDN) that exploits scene graph cues for OV detection.
- Score: 53.27673119360868
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, open-vocabulary (OV) object detection has attracted
increasing research attention. Unlike traditional detection, which only
recognizes fixed-category objects, OV detection aims to detect objects in an
open category set. Previous works often leverage vision-language (VL) training
data (e.g., referring grounding data) to recognize OV objects. However, they
only use pairs of nouns and individual objects in VL data, while these data
usually contain much more information, such as scene graphs, which are also
crucial for OV detection. In this paper, we propose a novel Scene-Graph-Based
Discovery Network (SGDN) that exploits scene graph cues for OV detection.
Firstly, a scene-graph-based decoder (SGDecoder) including sparse
scene-graph-guided attention (SSGA) is presented. It captures scene graphs and
leverages them to discover OV objects. Secondly, we propose scene-graph-based
prediction (SGPred), where we build a scene-graph-based offset regression
(SGOR) mechanism to enable mutual enhancement between scene graph extraction
and object localization. Thirdly, we design a cross-modal learning mechanism in
SGPred. It takes scene graphs as bridges to improve the consistency between
cross-modal embeddings for OV object classification. Experiments on COCO and
LVIS demonstrate the effectiveness of our approach. Moreover, we show the
ability of our model for OV scene graph detection, while previous OV scene
graph generation methods cannot tackle this task.
Related papers
- Modeling Dynamic Environments with Scene Graph Memory [46.587536843634055]
We present a new type of link prediction problem: link prediction on partially observable dynamic graphs.
Our graph is a representation of a scene in which rooms and objects are nodes, and their relationships are encoded in the edges.
We propose a novel state representation -- Scene Graph Memory (SGM) -- with captures the agent's accumulated set of observations.
We evaluate our method in the Dynamic House Simulator, a new benchmark that creates diverse dynamic graphs following the semantic patterns typically seen at homes.
arXiv Detail & Related papers (2023-05-27T17:39:38Z) - Location-Free Scene Graph Generation [45.366540803729386]
Scene Graph Generation (SGG) is a visual understanding task, aiming to describe a scene as a graph of entities and their relationships with each other.
Existing works rely on location labels in form of bounding boxes or segmentation masks, increasing annotation costs and limiting dataset expansion.
We break this dependency and introduce location-free scene graph generation (LF-SGG)
This new task aims at predicting instances of entities, as well as their relationships, without the explicit calculation of their spatial localization.
arXiv Detail & Related papers (2023-03-20T08:57:45Z) - Iterative Scene Graph Generation with Generative Transformers [6.243995448840211]
Scene graphs provide a rich, structured representation of a scene by encoding the entities (objects) and their spatial relationships in a graphical format.
Current approaches take a generation-by-classification approach where the scene graph is generated through labeling of all possible edges between objects in a scene.
This work introduces a generative transformer-based approach to generating scene graphs beyond link prediction.
arXiv Detail & Related papers (2022-11-30T00:05:44Z) - Relation Regularized Scene Graph Generation [206.76762860019065]
Scene graph generation (SGG) is built on top of detected objects to predict object pairwise visual relations.
We propose a relation regularized network (R2-Net) which can predict whether there is a relationship between two objects.
Our R2-Net can effectively refine object labels and generate scene graphs.
arXiv Detail & Related papers (2022-02-22T11:36:49Z) - Learning to Generate Scene Graph from Natural Language Supervision [52.18175340725455]
We propose one of the first methods that learn from image-sentence pairs to extract a graphical representation of localized objects and their relationships within an image, known as scene graph.
We leverage an off-the-shelf object detector to identify and localize object instances, match labels of detected regions to concepts parsed from captions, and thus create "pseudo" labels for learning scene graph.
arXiv Detail & Related papers (2021-09-06T03:38:52Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z) - Structural Temporal Graph Neural Networks for Anomaly Detection in
Dynamic Graphs [54.13919050090926]
We propose an end-to-end structural temporal Graph Neural Network model for detecting anomalous edges in dynamic graphs.
In particular, we first extract the $h$-hop enclosing subgraph centered on the target edge and propose the node labeling function to identify the role of each node in the subgraph.
Based on the extracted features, we utilize Gated recurrent units (GRUs) to capture the temporal information for anomaly detection.
arXiv Detail & Related papers (2020-05-15T09:17:08Z) - GPS-Net: Graph Property Sensing Network for Scene Graph Generation [91.60326359082408]
Scene graph generation (SGG) aims to detect objects in an image along with their pairwise relationships.
GPS-Net fully explores three properties for SGG: edge direction information, the difference in priority between nodes, and the long-tailed distribution of relationships.
GPS-Net achieves state-of-the-art performance on three popular databases: VG, OI, and VRD by significant gains under various settings and metrics.
arXiv Detail & Related papers (2020-03-29T07:22:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.