Relation Regularized Scene Graph Generation
- URL: http://arxiv.org/abs/2202.10826v1
- Date: Tue, 22 Feb 2022 11:36:49 GMT
- Title: Relation Regularized Scene Graph Generation
- Authors: Yuyu Guo, Lianli Gao, Jingkuan Song, Peng Wang, Nicu Sebe, Heng Tao
Shen, Xuelong Li
- Abstract summary: Scene graph generation (SGG) is built on top of detected objects to predict object pairwise visual relations.
We propose a relation regularized network (R2-Net) which can predict whether there is a relationship between two objects.
Our R2-Net can effectively refine object labels and generate scene graphs.
- Score: 206.76762860019065
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene graph generation (SGG) is built on top of detected objects to predict
object pairwise visual relations for describing the image content abstraction.
Existing works have revealed that if the links between objects are given as
prior knowledge, the performance of SGG is significantly improved. Inspired by
this observation, in this article, we propose a relation regularized network
(R2-Net), which can predict whether there is a relationship between two objects
and encode this relation into object feature refinement and better SGG.
Specifically, we first construct an affinity matrix among detected objects to
represent the probability of a relationship between two objects. Graph
convolution networks (GCNs) over this relation affinity matrix are then used as
object encoders, producing relation-regularized representations of objects.
With these relation-regularized features, our R2-Net can effectively refine
object labels and generate scene graphs. Extensive experiments are conducted on
the visual genome dataset for three SGG tasks (i.e., predicate classification,
scene graph classification, and scene graph detection), demonstrating the
effectiveness of our proposed method. Ablation studies also verify the key
roles of our proposed components in performance improvement.
Related papers
- Hierarchical Graph Interaction Transformer with Dynamic Token Clustering for Camouflaged Object Detection [57.883265488038134]
We propose a hierarchical graph interaction network termed HGINet for camouflaged object detection.
The network is capable of discovering imperceptible objects via effective graph interaction among the hierarchical tokenized features.
Our experiments demonstrate the superior performance of HGINet compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2024-08-27T12:53:25Z) - Scene Graph Generation Strategy with Co-occurrence Knowledge and Learnable Term Frequency [3.351553095054309]
Scene graph generation (SGG) represents the relationships between objects in an image as a graph structure.
Previous studies have failed to reflect the co-occurrence of objects during SGG generation.
We propose CooK, which reflects the Co-occurrence Knowledge between objects, and the learnable term frequency-inverse document frequency.
arXiv Detail & Related papers (2024-05-21T09:56:48Z) - EGTR: Extracting Graph from Transformer for Scene Graph Generation [5.935927309154952]
Scene Graph Generation (SGG) is a challenging task of detecting objects and predicting relationships between objects.
We propose a lightweight one-stage SGG model that extracts the relation graph from the various relationships learned in the multi-head self-attention layers of the DETR decoder.
We demonstrate the effectiveness and efficiency of our method for the Visual Genome and Open Image V6 datasets.
arXiv Detail & Related papers (2024-04-02T16:20:02Z) - Semantic Scene Graph Generation Based on an Edge Dual Scene Graph and
Message Passing Neural Network [3.9280441311534653]
Scene graph generation (SGG) captures the relationships between objects in an image and creates a structured graph-based representation.
Existing SGG methods have a limited ability to accurately predict detailed relationships.
A new approach to the modeling multiobject relationships, called edge dual scene graph generation (EdgeSGG), is proposed herein.
arXiv Detail & Related papers (2023-11-02T12:36:52Z) - Detecting Objects with Context-Likelihood Graphs and Graph Refinement [45.70356990655389]
The goal of this paper is to detect objects by exploiting their ins. Contrary to existing methods, which learn objects and relations separately, our key idea is to learn the object-relation distribution jointly.
We propose a novel way of creating a graphical representation of an image from inter-object relations and initial class predictions, we call a context-likelihood graph.
We then learn the joint with an energy-based modeling technique which allows a sample and refine the context-likelihood graph iteratively for a given image.
arXiv Detail & Related papers (2022-12-23T15:27:21Z) - HL-Net: Heterophily Learning Network for Scene Graph Generation [90.2766568914452]
We propose a novel Heterophily Learning Network (HL-Net) to explore the homophily and heterophily between objects/relationships in scene graphs.
HL-Net comprises the following 1) an adaptive reweighting transformer module, which adaptively integrates the information from different layers to exploit both the heterophily and homophily in objects.
We conducted extensive experiments on two public datasets: Visual Genome (VG) and Open Images (OI)
arXiv Detail & Related papers (2022-05-03T06:00:29Z) - Fully Convolutional Scene Graph Generation [30.194961716870186]
This paper presents a fully convolutional scene graph generation (FCSGG) model that detects objects and relations simultaneously.
FCSGG encodes objects as bounding box center points, and relationships as 2D vector fields which are named as Relation Affinity Fields (RAFs)
FCSGG achieves highly competitive results on recall and zero-shot recall with significantly reduced inference time.
arXiv Detail & Related papers (2021-03-30T05:25:38Z) - ConsNet: Learning Consistency Graph for Zero-Shot Human-Object
Interaction Detection [101.56529337489417]
We consider the problem of Human-Object Interaction (HOI) Detection, which aims to locate and recognize HOI instances in the form of human, action, object> in images.
We argue that multi-level consistencies among objects, actions and interactions are strong cues for generating semantic representations of rare or previously unseen HOIs.
Our model takes visual features of candidate human-object pairs and word embeddings of HOI labels as inputs, maps them into visual-semantic joint embedding space and obtains detection results by measuring their similarities.
arXiv Detail & Related papers (2020-08-14T09:11:18Z) - GPS-Net: Graph Property Sensing Network for Scene Graph Generation [91.60326359082408]
Scene graph generation (SGG) aims to detect objects in an image along with their pairwise relationships.
GPS-Net fully explores three properties for SGG: edge direction information, the difference in priority between nodes, and the long-tailed distribution of relationships.
GPS-Net achieves state-of-the-art performance on three popular databases: VG, OI, and VRD by significant gains under various settings and metrics.
arXiv Detail & Related papers (2020-03-29T07:22:31Z) - Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks [150.5425122989146]
This work proposes a novel attentive graph neural network (AGNN) for zero-shot video object segmentation (ZVOS)
AGNN builds a fully connected graph to efficiently represent frames as nodes, and relations between arbitrary frame pairs as edges.
Experimental results on three video segmentation datasets show that AGNN sets a new state-of-the-art in each case.
arXiv Detail & Related papers (2020-01-19T10:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.