Tackling the Challenges in Scene Graph Generation with Local-to-Global
Interactions
- URL: http://arxiv.org/abs/2106.08543v1
- Date: Wed, 16 Jun 2021 03:58:21 GMT
- Title: Tackling the Challenges in Scene Graph Generation with Local-to-Global
Interactions
- Authors: Sangmin Woo, Junhyug Noh, Kangil Kim
- Abstract summary: We seek new insights into the underlying challenges of the Scene Graph Generation (SGG) task.
Motivated by the analysis, we design a novel SGG framework, Local-to-Global Interaction Networks (LOGIN)
Our framework enables predicting the scene graph in a local-to-global manner by design, leveraging the possible complementariness.
- Score: 4.726777092009554
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we seek new insights into the underlying challenges of the
Scene Graph Generation (SGG) task. Quantitative and qualitative analysis of the
Visual Genome dataset implies -- 1) Ambiguity: even if inter-object
relationship contains the same object (or predicate), they may not be visually
or semantically similar, 2) Asymmetry: despite the nature of the relationship
that embodied the direction, it was not well addressed in previous studies, and
3) Higher-order contexts: leveraging the identities of certain graph elements
can help to generate accurate scene graphs. Motivated by the analysis, we
design a novel SGG framework, Local-to-Global Interaction Networks (LOGIN).
Locally, interactions extract the essence between three instances - subject,
object, and background - while baking direction awareness into the network by
constraining the input order. Globally, interactions encode the contexts
between every graph components -- nodes and edges. Also we introduce Attract &
Repel loss which finely adjusts predicate embeddings. Our framework enables
predicting the scene graph in a local-to-global manner by design, leveraging
the possible complementariness. To quantify how much LOGIN is aware of
relational direction, we propose a new diagnostic task called Bidirectional
Relationship Classification (BRC). We see that LOGIN can successfully
distinguish relational direction than existing methods (in BRC task) while
showing state-of-the-art results on the Visual Genome benchmark (in SGG task).
Related papers
- Beyond Entity Alignment: Towards Complete Knowledge Graph Alignment via Entity-Relation Synergy [14.459419325027612]
Knowledge Graph alignment aims to integrate knowledge from multiple sources to address the limitations of individual Knowledge Graphs.
Existing models primarily emphasize the linkage of cross-graph entities but overlook aligning relations across KGs.
We propose a novel Expectation-Maximization-based model, EREM, which iteratively optimize both sub-tasks.
arXiv Detail & Related papers (2024-07-25T03:40:09Z) - Towards a Unified Transformer-based Framework for Scene Graph Generation
and Human-object Interaction Detection [116.21529970404653]
We introduce SG2HOI+, a unified one-step model based on the Transformer architecture.
Our approach employs two interactive hierarchical Transformers to seamlessly unify the tasks of SGG and HOI detection.
Our approach achieves competitive performance when compared to state-of-the-art HOI methods.
arXiv Detail & Related papers (2023-11-03T07:25:57Z) - Semantic Scene Graph Generation Based on an Edge Dual Scene Graph and
Message Passing Neural Network [3.9280441311534653]
Scene graph generation (SGG) captures the relationships between objects in an image and creates a structured graph-based representation.
Existing SGG methods have a limited ability to accurately predict detailed relationships.
A new approach to the modeling multiobject relationships, called edge dual scene graph generation (EdgeSGG), is proposed herein.
arXiv Detail & Related papers (2023-11-02T12:36:52Z) - Relation Regularized Scene Graph Generation [206.76762860019065]
Scene graph generation (SGG) is built on top of detected objects to predict object pairwise visual relations.
We propose a relation regularized network (R2-Net) which can predict whether there is a relationship between two objects.
Our R2-Net can effectively refine object labels and generate scene graphs.
arXiv Detail & Related papers (2022-02-22T11:36:49Z) - Hyper-relationship Learning Network for Scene Graph Generation [95.6796681398668]
We propose a hyper-relationship learning network, termed HLN, for scene graph generation.
We evaluate HLN on the most popular SGG dataset, i.e., the Visual Genome dataset.
For example, the proposed HLN improves the recall per relationship from 11.3% to 13.1%, and maintains the recall per image from 19.8% to 34.9%.
arXiv Detail & Related papers (2022-02-15T09:26:16Z) - DigNet: Digging Clues from Local-Global Interactive Graph for
Aspect-level Sentiment Classification [0.685316573653194]
In aspect-level sentiment classification (ASC), state-of-the-art models encode either syntax graph or relation graph.
We design a novel local-global interactive graph, which marries their advantages by stitching the two graphs via interactive edges.
In this paper, we propose a novel neural network termed DigNet, whose core module is the stacked local-global interactive layers.
arXiv Detail & Related papers (2022-01-04T05:34:02Z) - Zero-Shot Scene Graph Relation Prediction through Commonsense Knowledge
Integration [9.203403318435486]
We propose CommOnsense-integrAted sCenegrapHrElation pRediction (COACHER), a framework to integrate commonsense knowledge for scene graph generation (SGG)
Specifically, we develop novel graph mining pipelines to model the neighborhoods and paths around entities in an external commonsense knowledge graph.
arXiv Detail & Related papers (2021-07-11T16:22:45Z) - ConsNet: Learning Consistency Graph for Zero-Shot Human-Object
Interaction Detection [101.56529337489417]
We consider the problem of Human-Object Interaction (HOI) Detection, which aims to locate and recognize HOI instances in the form of human, action, object> in images.
We argue that multi-level consistencies among objects, actions and interactions are strong cues for generating semantic representations of rare or previously unseen HOIs.
Our model takes visual features of candidate human-object pairs and word embeddings of HOI labels as inputs, maps them into visual-semantic joint embedding space and obtains detection results by measuring their similarities.
arXiv Detail & Related papers (2020-08-14T09:11:18Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z) - Bidirectional Graph Reasoning Network for Panoptic Segmentation [126.06251745669107]
We introduce a Bidirectional Graph Reasoning Network (BGRNet) to mine the intra-modular and intermodular relations within and between foreground things and background stuff classes.
BGRNet first constructs image-specific graphs in both instance and semantic segmentation branches that enable flexible reasoning at the proposal level and class level.
arXiv Detail & Related papers (2020-04-14T02:32:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.