Related papers: From General to Specific: Informative Scene Graph Generation via Balance Adjustment

From General to Specific: Informative Scene Graph Generation via Balance Adjustment

URL: http://arxiv.org/abs/2108.13129v1
Date: Mon, 30 Aug 2021 11:39:43 GMT
Title: From General to Specific: Informative Scene Graph Generation via Balance Adjustment
Authors: Yuyu Guo, Lianli Gao, Xuanhan Wang, Yuxuan Hu, Xing Xu, Xu Lu, Heng Tao Shen, Jingkuan Song
Abstract summary: Current models are stuck in common predicates, e.g., "on" and "at", rather than informative ones. We propose BA-SGG, a framework based on balance adjustment but not the conventional distribution fitting. Our method achieves 14.3%, 8.0%, and 6.1% higher Mean Recall (mR) than that of the Transformer model at three scene graph generation sub-tasks on Visual Genome.
Score: 113.04103371481067
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The scene graph generation (SGG) task aims to detect visual relationship triplets, i.e., subject, predicate, object, in an image, providing a structural vision layout for scene understanding. However, current models are stuck in common predicates, e.g., "on" and "at", rather than informative ones, e.g., "standing on" and "looking at", resulting in the loss of precise information and overall performance. If a model only uses "stone on road" rather than "blocking" to describe an image, it is easy to misunderstand the scene. We argue that this phenomenon is caused by two key imbalances between informative predicates and common ones, i.e., semantic space level imbalance and training sample level imbalance. To tackle this problem, we propose BA-SGG, a simple yet effective SGG framework based on balance adjustment but not the conventional distribution fitting. It integrates two components: Semantic Adjustment (SA) and Balanced Predicate Learning (BPL), respectively for adjusting these imbalances. Benefited from the model-agnostic process, our method is easily applied to the state-of-the-art SGG models and significantly improves the SGG performance. Our method achieves 14.3%, 8.0%, and 6.1% higher Mean Recall (mR) than that of the Transformer model at three scene graph generation sub-tasks on Visual Genome, respectively. Codes are publicly available.

Related papers

PRISM-0: A Predicate-Rich Scene Graph Generation Framework for Zero-Shot Open-Vocabulary Tasks [51.31903029903904]
In Scene Graphs Generation (SGG) one extracts structured representation from visual inputs in the form of objects nodes and predicates connecting them. PRISM-0 is a framework for zero-shot open-vocabulary SGG that bootstraps foundation models in a bottom-up approach. PRIMS-0 generates semantically meaningful graphs that improve downstream tasks such as Image Captioning and Sentence-to-Graph Retrieval.
arXiv Detail & Related papers (2025-04-01T14:29:51Z)
Improving Scene Graph Generation with Relation Words' Debiasing in Vision-Language Models [6.8754535229258975]
Scene Graph Generation (SGG) provides basic language representation of visual scenes. Part of test triplets are rare or even unseen during training, resulting in predictions. We propose using the SGG models with pretrained vision-language models (VLMs) to enhance representation.
arXiv Detail & Related papers (2024-03-24T15:02:24Z)
GPT4SGG: Synthesizing Scene Graphs from Holistic and Region-specific Narratives [69.36723767339001]
We propose a novel framework named textitGPT4SGG to obtain more accurate and comprehensive scene graph signals. We show textitGPT4SGG significantly improves the performance of SGG models trained on image-caption data.
arXiv Detail & Related papers (2023-12-07T14:11:00Z)
Informative Scene Graph Generation via Debiasing [124.71164256146342]
Scene graph generation aims to detect visual relationship triplets, (subject, predicate, object) Due to biases in data, current models tend to predict common predicates. We propose DB-SGG, an effective framework based on debiasing but not the conventional distribution fitting.
arXiv Detail & Related papers (2023-08-10T02:04:01Z)
Towards Unseen Triples: Effective Text-Image-joint Learning for Scene Graph Generation [30.79358827005448]
Scene Graph Generation (SGG) aims to structurally and comprehensively represent objects and their connections in images. Existing SGG models often struggle to solve the long-tailed problem caused by biased datasets. We propose a Text-Image-joint Scene Graph Generation (TISGG) model to resolve the unseen triples and improve the generalisation capability of the SGG models.
arXiv Detail & Related papers (2023-06-23T10:17:56Z)
Towards Open-vocabulary Scene Graph Generation with Prompt-based Finetuning [84.39787427288525]
Scene graph generation (SGG) is a fundamental task aimed at detecting visual relations between objects in an image. We introduce open-vocabulary scene graph generation, a novel, realistic and challenging setting in which a model is trained on a set of base object classes. Our method can support inference over completely unseen object classes, which existing methods are incapable of handling.
arXiv Detail & Related papers (2022-08-17T09:05:38Z)
Fine-Grained Predicates Learning for Scene Graph Generation [155.48614435437355]
Fine-Grained Predicates Learning aims at differentiating among hard-to-distinguish predicates for Scene Graph Generation task. We introduce a Predicate Lattice that helps SGG models to figure out fine-grained predicate pairs. We then propose a Category Discriminating Loss and an Entity Discriminating Loss, which both contribute to distinguishing fine-grained predicates.
arXiv Detail & Related papers (2022-04-06T06:20:09Z)
Not All Relations are Equal: Mining Informative Labels for Scene Graph Generation [48.21846438269506]
Scene graph generation (SGG) aims to capture a wide variety of interactions between pairs of objects. Existing SGG methods fail to acquire complex reasoning about visual and textual correlations due to various biases in training data. We propose a novel framework for SGG training that exploits relation labels based on their informativeness.
arXiv Detail & Related papers (2021-11-26T14:34:12Z)
Semantic Compositional Learning for Low-shot Scene Graph Generation [122.51930904132685]
Many scene graph generation (SGG) models solely use the limited annotated relation triples for training. We propose a novel semantic compositional learning strategy that makes it possible to construct additional, realistic relation triples. For three recent SGG models, adding our strategy improves their performance by close to 50%, and all of them substantially exceed the current state-of-the-art.
arXiv Detail & Related papers (2021-08-19T10:13:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.