Fully Convolutional Scene Graph Generation
- URL: http://arxiv.org/abs/2103.16083v1
- Date: Tue, 30 Mar 2021 05:25:38 GMT
- Title: Fully Convolutional Scene Graph Generation
- Authors: Hengyue Liu, Ning Yan, Masood S. Mortazavi, Bir Bhanu
- Abstract summary: This paper presents a fully convolutional scene graph generation (FCSGG) model that detects objects and relations simultaneously.
FCSGG encodes objects as bounding box center points, and relationships as 2D vector fields which are named as Relation Affinity Fields (RAFs)
FCSGG achieves highly competitive results on recall and zero-shot recall with significantly reduced inference time.
- Score: 30.194961716870186
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a fully convolutional scene graph generation (FCSGG)
model that detects objects and relations simultaneously. Most of the scene
graph generation frameworks use a pre-trained two-stage object detector, like
Faster R-CNN, and build scene graphs using bounding box features. Such pipeline
usually has a large number of parameters and low inference speed. Unlike these
approaches, FCSGG is a conceptually elegant and efficient bottom-up approach
that encodes objects as bounding box center points, and relationships as 2D
vector fields which are named as Relation Affinity Fields (RAFs). RAFs encode
both semantic and spatial features, and explicitly represent the relationship
between a pair of objects by the integral on a sub-region that points from
subject to object. FCSGG only utilizes visual features and still generates
strong results for scene graph generation. Comprehensive experiments on the
Visual Genome dataset demonstrate the efficacy, efficiency, and
generalizability of the proposed method. FCSGG achieves highly competitive
results on recall and zero-shot recall with significantly reduced inference
time.
Related papers
- AUG: A New Dataset and An Efficient Model for Aerial Image Urban Scene Graph Generation [40.149652254414185]
This paper constructs and releases an aerial image urban scene graph generation (AUG) dataset.
Images from the AUG dataset are captured with the low-attitude overhead view.
To avoid the local context being overwhelmed in the complex aerial urban scene, this paper proposes one new locality-preserving graph convolutional network (LPG)
arXiv Detail & Related papers (2024-04-11T14:29:30Z) - Semantic Scene Graph Generation Based on an Edge Dual Scene Graph and
Message Passing Neural Network [3.9280441311534653]
Scene graph generation (SGG) captures the relationships between objects in an image and creates a structured graph-based representation.
Existing SGG methods have a limited ability to accurately predict detailed relationships.
A new approach to the modeling multiobject relationships, called edge dual scene graph generation (EdgeSGG), is proposed herein.
arXiv Detail & Related papers (2023-11-02T12:36:52Z) - Iterative Scene Graph Generation with Generative Transformers [6.243995448840211]
Scene graphs provide a rich, structured representation of a scene by encoding the entities (objects) and their spatial relationships in a graphical format.
Current approaches take a generation-by-classification approach where the scene graph is generated through labeling of all possible edges between objects in a scene.
This work introduces a generative transformer-based approach to generating scene graphs beyond link prediction.
arXiv Detail & Related papers (2022-11-30T00:05:44Z) - Towards Open-vocabulary Scene Graph Generation with Prompt-based
Finetuning [84.39787427288525]
Scene graph generation (SGG) is a fundamental task aimed at detecting visual relations between objects in an image.
We introduce open-vocabulary scene graph generation, a novel, realistic and challenging setting in which a model is trained on a set of base object classes.
Our method can support inference over completely unseen object classes, which existing methods are incapable of handling.
arXiv Detail & Related papers (2022-08-17T09:05:38Z) - HL-Net: Heterophily Learning Network for Scene Graph Generation [90.2766568914452]
We propose a novel Heterophily Learning Network (HL-Net) to explore the homophily and heterophily between objects/relationships in scene graphs.
HL-Net comprises the following 1) an adaptive reweighting transformer module, which adaptively integrates the information from different layers to exploit both the heterophily and homophily in objects.
We conducted extensive experiments on two public datasets: Visual Genome (VG) and Open Images (OI)
arXiv Detail & Related papers (2022-05-03T06:00:29Z) - Relation Regularized Scene Graph Generation [206.76762860019065]
Scene graph generation (SGG) is built on top of detected objects to predict object pairwise visual relations.
We propose a relation regularized network (R2-Net) which can predict whether there is a relationship between two objects.
Our R2-Net can effectively refine object labels and generate scene graphs.
arXiv Detail & Related papers (2022-02-22T11:36:49Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z) - High-Order Information Matters: Learning Relation and Topology for
Occluded Person Re-Identification [84.43394420267794]
We propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment.
Our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.
arXiv Detail & Related papers (2020-03-18T12:18:35Z) - Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks [150.5425122989146]
This work proposes a novel attentive graph neural network (AGNN) for zero-shot video object segmentation (ZVOS)
AGNN builds a fully connected graph to efficiently represent frames as nodes, and relations between arbitrary frame pairs as edges.
Experimental results on three video segmentation datasets show that AGNN sets a new state-of-the-art in each case.
arXiv Detail & Related papers (2020-01-19T10:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.