Related papers: Robo-SGG: Exploiting Layout-Oriented Normalization and Restitution for Robust Scene Graph Generation

Robo-SGG: Exploiting Layout-Oriented Normalization and Restitution for Robust Scene Graph Generation

URL: http://arxiv.org/abs/2504.12606v1
Date: Thu, 17 Apr 2025 03:09:22 GMT
Title: Robo-SGG: Exploiting Layout-Oriented Normalization and Restitution for Robust Scene Graph Generation
Authors: Changsheng Lv, Mengshi Qi, Zijian Fu, Huadong Ma,
Abstract summary: We introduce a novel method named Robo-SGG, i.e., Layout-Oriented Normalization and Restitution for Robust Scene Graph Generation.<n>Our proposed Robo-SGG module is designed as a plug-and-play component, which can be easily integrated into any baseline SGG model.<n>We achieve relative improvements of 5.6%, 8.0%, and 6.5% in mR@50 for PredCls, SGCls, and SGDet tasks, respectively, and achieve new state-of-the-art performance in corruption scene graph generation benchmark (VG-C and GQA-
Score: 22.58434223222062
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we introduce a novel method named Robo-SGG, i.e., Layout-Oriented Normalization and Restitution for Robust Scene Graph Generation. Compared to the existing SGG setting, the robust scene graph generation aims to perform inference on a diverse range of corrupted images, with the core challenge being the domain shift between the clean and corrupted images. Existing SGG methods suffer from degraded performance due to compromised visual features e.g., corruption interference or occlusions. To obtain robust visual features, we exploit the layout information, which is domain-invariant, to enhance the efficacy of existing SGG methods on corrupted images. Specifically, we employ Instance Normalization(IN) to filter out the domain-specific feature and recover the unchangeable structural features, i.e., the positional and semantic relationships among objects by the proposed Layout-Oriented Restitution. Additionally, we propose a Layout-Embedded Encoder (LEE) that augments the existing object and predicate encoders within the SGG framework, enriching the robust positional and semantic features of objects and predicates. Note that our proposed Robo-SGG module is designed as a plug-and-play component, which can be easily integrated into any baseline SGG model. Extensive experiments demonstrate that by integrating the state-of-the-art method into our proposed Robo-SGG, we achieve relative improvements of 5.6%, 8.0%, and 6.5% in mR@50 for PredCls, SGCls, and SGDet tasks on the VG-C dataset, respectively, and achieve new state-of-the-art performance in corruption scene graph generation benchmark (VG-C and GQA-C). We will release our source code and model.

Related papers

TRKT: Weakly Supervised Dynamic Scene Graph Generation with Temporal-enhanced Relation-aware Knowledge Transferring [43.25899008682682]
Dynamic Scene Graph Generation (DSGG) aims to create a scene graph for each video frame by detecting objects and predicting their relationships.<n>Existing WS-DSGG methods depend on an off-the-shelf external object detector to generate pseudo labels for subsequent DSGG training.<n>We propose a Temporal-enhanced Relation-aware Knowledge Transferring (TRKT) method, which leverages knowledge to enhance detection in relation-aware dynamic scenarios.
arXiv Detail & Related papers (2025-08-07T00:17:45Z)
MB-ORES: A Multi-Branch Object Reasoner for Visual Grounding in Remote Sensing [0.08192907805418585]
We propose a unified framework that integrates object detection (OD) and visual grounding (VG) for remote sensing (RS) imagery. Our model demonstrates superior performance on the OPT-RSVG and DIOR-RSVG datasets.
arXiv Detail & Related papers (2025-03-31T15:36:41Z)
COB-GS: Clear Object Boundaries in 3DGS Segmentation Based on Boundary-Adaptive Gaussian Splitting [67.03992455145325]
3D segmentation based on 3D Gaussian Splatting (3DGS) struggles with accurately delineating object boundaries.<n>We introduce Clear Object Boundaries for 3DGS (COB-GS), which aims to improve segmentation accuracy.<n>For semantic guidance, we introduce a boundary-adaptive Gaussian splitting technique.<n>For the visual optimization, we rectify the degraded texture of the 3DGS scene.
arXiv Detail & Related papers (2025-03-25T08:31:43Z)
DIFFVSGG: Diffusion-Driven Online Video Scene Graph Generation [61.59996525424585]
DIFFVSGG is an online VSGG solution that frames this task as an iterative scene graph update problem. We unify the decoding of object classification, bounding box regression, and graph generation three tasks using one shared feature embedding. DIFFVSGG further facilitates continuous temporal reasoning, where predictions for subsequent frames leverage results of past frames as the conditional inputs of LDMs.
arXiv Detail & Related papers (2025-03-18T06:49:51Z)
AUG: A New Dataset and An Efficient Model for Aerial Image Urban Scene Graph Generation [40.149652254414185]
This paper constructs and releases an aerial image urban scene graph generation (AUG) dataset. Images from the AUG dataset are captured with the low-attitude overhead view. To avoid the local context being overwhelmed in the complex aerial urban scene, this paper proposes one new locality-preserving graph convolutional network (LPG)
arXiv Detail & Related papers (2024-04-11T14:29:30Z)
HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation [13.929906773382752]
A common approach enabling the ability to reason over visual data is Scene Graph Generation (SGG) We propose a novel SGG benchmark containing procedurally generated weather corruptions and other transformations over the Visual Genome dataset. We show that HiKER-SGG does not only demonstrate superior performance on corrupted images in a zero-shot manner, but also outperforms current state-of-the-art methods on uncorrupted SGG tasks.
arXiv Detail & Related papers (2024-03-18T17:59:10Z)
S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR [50.435592120607815]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR) Previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection. In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z)
Adaptive Self-training Framework for Fine-grained Scene Graph Generation [29.37568710952893]
Scene graph generation (SGG) models have suffered from inherent problems regarding the benchmark datasets. We introduce a Self-Training framework for SGG (ST-SGG) that assigns pseudo-labels for unannotated triplets. Our experiments verify the effectiveness of ST-SGG on various SGG models.
arXiv Detail & Related papers (2024-01-18T08:10:34Z)
Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention [69.36723767339001]
Scene Graph Generation (SGG) offers a structured representation critical in many computer vision applications. We propose a unified framework named OvSGTR towards fully open vocabulary SGG from a holistic view. For the more challenging settings of relation-involved open vocabulary SGG, the proposed approach integrates relation-aware pretraining.
arXiv Detail & Related papers (2023-11-18T06:49:17Z)
Towards Open-vocabulary Scene Graph Generation with Prompt-based Finetuning [84.39787427288525]
Scene graph generation (SGG) is a fundamental task aimed at detecting visual relations between objects in an image. We introduce open-vocabulary scene graph generation, a novel, realistic and challenging setting in which a model is trained on a set of base object classes. Our method can support inference over completely unseen object classes, which existing methods are incapable of handling.
arXiv Detail & Related papers (2022-08-17T09:05:38Z)
Relation Regularized Scene Graph Generation [206.76762860019065]
Scene graph generation (SGG) is built on top of detected objects to predict object pairwise visual relations. We propose a relation regularized network (R2-Net) which can predict whether there is a relationship between two objects. Our R2-Net can effectively refine object labels and generate scene graphs.
arXiv Detail & Related papers (2022-02-22T11:36:49Z)
High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification [84.43394420267794]
We propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment. Our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.
arXiv Detail & Related papers (2020-03-18T12:18:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.