Related papers: RS-Net: Context-Aware Relation Scoring for Dynamic Scene Graph Generation

RS-Net: Context-Aware Relation Scoring for Dynamic Scene Graph Generation

URL: http://arxiv.org/abs/2511.08651v1
Date: Thu, 13 Nov 2025 01:01:30 GMT
Title: RS-Net: Context-Aware Relation Scoring for Dynamic Scene Graph Generation
Authors: Hae-Won Jo, Yeong-Jun Cho,
Abstract summary: Dynamic Scene Graph Generation (DSGG) models how object relations evolve over time in videos.<n>Existing methods are trained only on annotated object pairs and lack guidance for non-related pairs, making it difficult to identify meaningful relations during inference.<n>We propose Relation Scoring Network (RS-Net), a modular framework that scores the contextual importance of object pairs using both spatial interactions and long-range temporal context.
Score: 1.7188280334580195
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Dynamic Scene Graph Generation (DSGG) models how object relations evolve over time in videos. However, existing methods are trained only on annotated object pairs and lack guidance for non-related pairs, making it difficult to identify meaningful relations during inference. In this paper, we propose Relation Scoring Network (RS-Net), a modular framework that scores the contextual importance of object pairs using both spatial interactions and long-range temporal context. RS-Net consists of a spatial context encoder with learnable context tokens and a temporal encoder that aggregates video-level information. The resulting relation scores are integrated into a unified triplet scoring mechanism to enhance relation prediction. RS-Net can be easily integrated into existing DSGG models without architectural changes. Experiments on the Action Genome dataset show that RS-Net consistently improves both Recall and Precision across diverse baselines, with notable gains in mean Recall, highlighting its ability to address the long-tailed distribution of relations. Despite the increased number of parameters, RS-Net maintains competitive efficiency, achieving superior performance over state-of-the-art methods.

Related papers

Edge-Centric Relational Reasoning for 3D Scene Graph Prediction [74.19580969696898]
3D scene graph prediction aims to abstract complex 3D environments into structured graphs consisting of objects and their pairwise relationships.<n>Existing approaches typically adopt object-centric graph neural networks, where relation edge features are iteratively updated by aggregating messages from connected object nodes.<n>We propose a Link-guided Edge-centric relational reasoning framework with Object-aware fusion.
arXiv Detail & Related papers (2025-11-19T09:53:56Z)
SEP-GCN: Leveraging Similar Edge Pairs with Temporal and Spatial Contexts for Location-Based Recommender Systems [0.0]
We propose SEP-GCN, a novel graph-based recommendation framework that learns from pairs of contextually similar interaction edges.<n>By identifying edge pairs that occur within similar temporal windows or geographic proximity, SEP-GCN augments the user-item graph with contextual similarity links.<n> Experiments on benchmark data sets show that SEP-GCN consistently outperforms strong baselines in both predictive accuracy and robustness.
arXiv Detail & Related papers (2025-06-19T03:48:30Z)
RelGNN: Composite Message Passing for Relational Deep Learning [56.48834369525997]
We introduce RelGNN, a novel GNN framework specifically designed to leverage the unique structural characteristics of the graphs built from relational databases.<n>RelGNN is evaluated on 30 diverse real-world tasks from Relbench (Fey et al., 2024), and achieves state-of-the-art performance on the vast majority tasks, with improvements of up to 25%.
arXiv Detail & Related papers (2025-02-10T18:58:40Z)
Multi-Scene Generalized Trajectory Global Graph Solver with Composite Nodes for Multiple Object Tracking [61.69892497726235]
Composite Node Message Passing Network (CoNo-Link) is a framework for modeling ultra-long frames information for association. In addition to the previous method of treating objects as nodes, the network innovatively treats object trajectories as nodes for information interaction. Our model can learn better predictions on longer-time scales by adding composite nodes.
arXiv Detail & Related papers (2023-12-14T14:00:30Z)
Semantic Scene Graph Generation Based on an Edge Dual Scene Graph and Message Passing Neural Network [3.9280441311534653]
Scene graph generation (SGG) captures the relationships between objects in an image and creates a structured graph-based representation. Existing SGG methods have a limited ability to accurately predict detailed relationships. A new approach to the modeling multiobject relationships, called edge dual scene graph generation (EdgeSGG), is proposed herein.
arXiv Detail & Related papers (2023-11-02T12:36:52Z)
Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision. A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive. We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z)
Relation Regularized Scene Graph Generation [206.76762860019065]
Scene graph generation (SGG) is built on top of detected objects to predict object pairwise visual relations. We propose a relation regularized network (R2-Net) which can predict whether there is a relationship between two objects. Our R2-Net can effectively refine object labels and generate scene graphs.
arXiv Detail & Related papers (2022-02-22T11:36:49Z)
Target Adaptive Context Aggregation for Video Scene Graph Generation [36.669700084337045]
This paper deals with a challenging task of video scene graph generation (VidSGG) We present a new em detect-to-track paradigm for this task by decoupling the context modeling for relation prediction from the complicated low-level entity tracking.
arXiv Detail & Related papers (2021-08-18T12:46:28Z)
Improved Representation Learning for Session-based Recommendation [0.0]
Session-based recommendation systems suggest relevant items to users by modeling user behavior and preferences using short-term anonymous sessions. Existing methods leverage Graph Neural Networks (GNNs) that propagate and aggregate information from neighboring nodes. We propose using a Transformer in combination with a target attentive GNN, which allows richer Representation Learning.
arXiv Detail & Related papers (2021-07-04T00:57:28Z)
Jointly Cross- and Self-Modal Graph Attention Network for Query-Based Moment Localization [77.21951145754065]
We propose a novel Cross- and Self-Modal Graph Attention Network (CSMGAN) that recasts this task as a process of iterative messages passing over a joint graph. Our CSMGAN is able to effectively capture high-order interactions between two modalities, thus enabling a further precise localization.
arXiv Detail & Related papers (2020-08-04T08:25:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.