Related papers: Salience-SGG: Enhancing Unbiased Scene Graph Generation with Iterative Salience Estimation

Salience-SGG: Enhancing Unbiased Scene Graph Generation with Iterative Salience Estimation

URL: http://arxiv.org/abs/2601.08728v1
Date: Tue, 13 Jan 2026 16:57:09 GMT
Title: Salience-SGG: Enhancing Unbiased Scene Graph Generation with Iterative Salience Estimation
Authors: Runfeng Qu, Ole Hall, Pia K Bideau, Julie Ouerfelli-Ethier, Martin Rolfs, Klaus Obermayer, Olaf Hellwich,
Abstract summary: We introduce Salience-SGG, a framework featuring an Iterative Salience Decoder (ISD) that emphasizes triplets with salient spatial structures.<n>We show that Salience-SGG achieves state-of-the-art performance and improves existing Unbiased-SGG methods in their spatial understanding.
Score: 2.4674974968078343
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scene Graph Generation (SGG) suffers from a long-tailed distribution, where a few predicate classes dominate while many others are underrepresented, leading to biased models that underperform on rare relations. Unbiased-SGG methods address this issue by implementing debiasing strategies, but often at the cost of spatial understanding, resulting in an over-reliance on semantic priors. We introduce Salience-SGG, a novel framework featuring an Iterative Salience Decoder (ISD) that emphasizes triplets with salient spatial structures. To support this, we propose semantic-agnostic salience labels guiding ISD. Evaluations on Visual Genome, Open Images V6, and GQA-200 show that Salience-SGG achieves state-of-the-art performance and improves existing Unbiased-SGG methods in their spatial understanding as demonstrated by the Pairwise Localization Average Precision

Related papers

PRISM-0: A Predicate-Rich Scene Graph Generation Framework for Zero-Shot Open-Vocabulary Tasks [51.31903029903904]
In Scene Graphs Generation (SGG) one extracts structured representation from visual inputs in the form of objects nodes and predicates connecting them.<n> PRISM-0 is a framework for zero-shot open-vocabulary SGG that bootstraps foundation models in a bottom-up approach.<n> PRIMS-0 generates semantically meaningful graphs that improve downstream tasks such as Image Captioning and Sentence-to-Graph Retrieval.
arXiv Detail & Related papers (2025-04-01T14:29:51Z)
RA-SGG: Retrieval-Augmented Scene Graph Generation Framework via Multi-Prototype Learning [24.52282123604646]
Scene Graph Generation (SGG) research has suffered from two fundamental challenges: the long-tailed predicate distribution and semantic ambiguity between predicates.<n>We propose Retrieval-Augmented Scene Graph Generation (RA-SGG), which identifies potential instances to be multi-labeled and enriches the single-label with multi-labels that are semantically similar to the original label.<n> RA-SGG effectively alleviates the issue of biased prediction caused by the long-tailed distribution and semantic ambiguity of predicates.
arXiv Detail & Related papers (2024-12-17T10:47:13Z)
Predicate Debiasing in Vision-Language Models Integration for Scene Graph Generation Enhancement [6.8754535229258975]
Scene Graph Generation (SGG) provides basic language representation of visual scenes.<n>Part of triplet labels are rare or even unseen during training, resulting in imprecise predictions.<n>We propose integrating pretrained Vision-language Models to enhance representation.
arXiv Detail & Related papers (2024-03-24T15:02:24Z)
Adaptive Self-training Framework for Fine-grained Scene Graph Generation [29.37568710952893]
Scene graph generation (SGG) models have suffered from inherent problems regarding the benchmark datasets. We introduce a Self-Training framework for SGG (ST-SGG) that assigns pseudo-labels for unannotated triplets. Our experiments verify the effectiveness of ST-SGG on various SGG models.
arXiv Detail & Related papers (2024-01-18T08:10:34Z)
Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention [69.36723767339001]
Scene Graph Generation (SGG) offers a structured representation critical in many computer vision applications. We propose a unified framework named OvSGTR towards fully open vocabulary SGG from a holistic view. For the more challenging settings of relation-involved open vocabulary SGG, the proposed approach integrates relation-aware pretraining.
arXiv Detail & Related papers (2023-11-18T06:49:17Z)
Informative Scene Graph Generation via Debiasing [124.71164256146342]
Scene graph generation aims to detect visual relationship triplets, (subject, predicate, object) Due to biases in data, current models tend to predict common predicates. We propose DB-SGG, an effective framework based on debiasing but not the conventional distribution fitting.
arXiv Detail & Related papers (2023-08-10T02:04:01Z)
CAME: Context-aware Mixture-of-Experts for Unbiased Scene Graph Generation [10.724516317292926]
We present a simple yet effective method called Context-Aware Mixture-of-Experts (CAME) to improve the model diversity and alleviate the biased scene graph generator. We have conducted extensive experiments on three tasks on the Visual Genome dataset to show that came achieved superior performance over previous methods.
arXiv Detail & Related papers (2022-08-15T10:39:55Z)
Learning To Generate Scene Graph from Head to Tail [65.48134724633472]
We propose a novel SGG framework, learning to generate scene graphs from Head to Tail (SGG-HT) CRM learns head/easy samples firstly for robust features of head predicates and then gradually focuses on tail/hard ones. SCM is proposed to relieve semantic deviation by ensuring the semantic consistency between the generated scene graph and the ground truth in global and local representations.
arXiv Detail & Related papers (2022-06-23T12:16:44Z)
Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation [62.96628432641806]
Scene Graph Generation aims to first encode the visual contents within the given image and then parse them into a compact summary graph. We first present a novel Stacked Hybrid-Attention network, which facilitates the intra-modal refinement as well as the inter-modal interaction. We then devise an innovative Group Collaborative Learning strategy to optimize the decoder.
arXiv Detail & Related papers (2022-03-18T09:14:13Z)
Spatial-spectral Hyperspectral Image Classification via Multiple Random Anchor Graphs Ensemble Learning [88.60285937702304]
This paper proposes a novel spatial-spectral HSI classification method via multiple random anchor graphs ensemble learning (RAGE) Firstly, the local binary pattern is adopted to extract the more descriptive features on each selected band, which preserves local structures and subtle changes of a region. Secondly, the adaptive neighbors assignment is introduced in the construction of anchor graph, to reduce the computational complexity.
arXiv Detail & Related papers (2021-03-25T09:31:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.