From General to Specific: Informative Scene Graph Generation via Balance
Adjustment
- URL: http://arxiv.org/abs/2108.13129v1
- Date: Mon, 30 Aug 2021 11:39:43 GMT
- Title: From General to Specific: Informative Scene Graph Generation via Balance
Adjustment
- Authors: Yuyu Guo, Lianli Gao, Xuanhan Wang, Yuxuan Hu, Xing Xu, Xu Lu, Heng
Tao Shen, Jingkuan Song
- Abstract summary: Current models are stuck in common predicates, e.g., "on" and "at", rather than informative ones.
We propose BA-SGG, a framework based on balance adjustment but not the conventional distribution fitting.
Our method achieves 14.3%, 8.0%, and 6.1% higher Mean Recall (mR) than that of the Transformer model at three scene graph generation sub-tasks on Visual Genome.
- Score: 113.04103371481067
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The scene graph generation (SGG) task aims to detect visual relationship
triplets, i.e., subject, predicate, object, in an image, providing a structural
vision layout for scene understanding. However, current models are stuck in
common predicates, e.g., "on" and "at", rather than informative ones, e.g.,
"standing on" and "looking at", resulting in the loss of precise information
and overall performance. If a model only uses "stone on road" rather than
"blocking" to describe an image, it is easy to misunderstand the scene. We
argue that this phenomenon is caused by two key imbalances between informative
predicates and common ones, i.e., semantic space level imbalance and training
sample level imbalance. To tackle this problem, we propose BA-SGG, a simple yet
effective SGG framework based on balance adjustment but not the conventional
distribution fitting. It integrates two components: Semantic Adjustment (SA)
and Balanced Predicate Learning (BPL), respectively for adjusting these
imbalances. Benefited from the model-agnostic process, our method is easily
applied to the state-of-the-art SGG models and significantly improves the SGG
performance. Our method achieves 14.3%, 8.0%, and 6.1% higher Mean Recall (mR)
than that of the Transformer model at three scene graph generation sub-tasks on
Visual Genome, respectively. Codes are publicly available.
Related papers
- Improving Scene Graph Generation with Relation Words' Debiasing in Vision-Language Models [6.8754535229258975]
Scene Graph Generation (SGG) provides basic language representation of visual scenes.
Part of test triplets are rare or even unseen during training, resulting in predictions.
We propose using the SGG models with pretrained vision-language models (VLMs) to enhance representation.
arXiv Detail & Related papers (2024-03-24T15:02:24Z) - GPT4SGG: Synthesizing Scene Graphs from Holistic and Region-specific Narratives [69.36723767339001]
We propose a novel framework named textitGPT4SGG to obtain more accurate and comprehensive scene graph signals.
We show textitGPT4SGG significantly improves the performance of SGG models trained on image-caption data.
arXiv Detail & Related papers (2023-12-07T14:11:00Z) - Informative Scene Graph Generation via Debiasing [111.36290856077584]
Scene graph generation aims to detect visual relationship triplets, (subject, predicate, object)
Due to biases in data, current models tend to predict common predicates.
We propose DB-SGG, an effective framework based on debiasing but not the conventional distribution fitting.
arXiv Detail & Related papers (2023-08-10T02:04:01Z) - Towards Unseen Triples: Effective Text-Image-joint Learning for Scene
Graph Generation [30.79358827005448]
Scene Graph Generation (SGG) aims to structurally and comprehensively represent objects and their connections in images.
Existing SGG models often struggle to solve the long-tailed problem caused by biased datasets.
We propose a Text-Image-joint Scene Graph Generation (TISGG) model to resolve the unseen triples and improve the generalisation capability of the SGG models.
arXiv Detail & Related papers (2023-06-23T10:17:56Z) - Towards Open-vocabulary Scene Graph Generation with Prompt-based
Finetuning [84.39787427288525]
Scene graph generation (SGG) is a fundamental task aimed at detecting visual relations between objects in an image.
We introduce open-vocabulary scene graph generation, a novel, realistic and challenging setting in which a model is trained on a set of base object classes.
Our method can support inference over completely unseen object classes, which existing methods are incapable of handling.
arXiv Detail & Related papers (2022-08-17T09:05:38Z) - Fine-Grained Predicates Learning for Scene Graph Generation [155.48614435437355]
Fine-Grained Predicates Learning aims at differentiating among hard-to-distinguish predicates for Scene Graph Generation task.
We introduce a Predicate Lattice that helps SGG models to figure out fine-grained predicate pairs.
We then propose a Category Discriminating Loss and an Entity Discriminating Loss, which both contribute to distinguishing fine-grained predicates.
arXiv Detail & Related papers (2022-04-06T06:20:09Z) - Not All Relations are Equal: Mining Informative Labels for Scene Graph
Generation [48.21846438269506]
Scene graph generation (SGG) aims to capture a wide variety of interactions between pairs of objects.
Existing SGG methods fail to acquire complex reasoning about visual and textual correlations due to various biases in training data.
We propose a novel framework for SGG training that exploits relation labels based on their informativeness.
arXiv Detail & Related papers (2021-11-26T14:34:12Z) - Semantic Compositional Learning for Low-shot Scene Graph Generation [122.51930904132685]
Many scene graph generation (SGG) models solely use the limited annotated relation triples for training.
We propose a novel semantic compositional learning strategy that makes it possible to construct additional, realistic relation triples.
For three recent SGG models, adding our strategy improves their performance by close to 50%, and all of them substantially exceed the current state-of-the-art.
arXiv Detail & Related papers (2021-08-19T10:13:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.