Related papers: Exploiting Long-Term Dependencies for Generating Dynamic Scene Graphs

Exploiting Long-Term Dependencies for Generating Dynamic Scene Graphs

URL: http://arxiv.org/abs/2112.09828v1
Date: Sat, 18 Dec 2021 03:02:11 GMT
Title: Exploiting Long-Term Dependencies for Generating Dynamic Scene Graphs
Authors: Shengyu Feng, Subarna Tripathi, Hesham Mostafa, Marcel Nassar, Somdeb Majumdar
Abstract summary: We show that capturing long-term dependencies is the key to effective generation of dynamic scene graphs. Experimental results demonstrate that our Dynamic Scene Graph Detection Transformer (DSG-DETR) outperforms state-of-the-art methods.
Score: 15.614710220461353
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Structured video representation in the form of dynamic scene graphs is an effective tool for several video understanding tasks. Compared to the task of scene graph generation from images, dynamic scene graph generation is more challenging due to the temporal dynamics of the scene and the inherent temporal fluctuations of predictions. We show that capturing long-term dependencies is the key to effective generation of dynamic scene graphs. We present the detect-track-recognize paradigm by constructing consistent long-term object tracklets from a video, followed by transformers to capture the dynamics of objects and visual relations. Experimental results demonstrate that our Dynamic Scene Graph Detection Transformer (DSG-DETR) outperforms state-of-the-art methods by a significant margin on the benchmark dataset Action Genome. We also perform ablation studies and validate the effectiveness of each component of the proposed approach.

Related papers

THYME: Temporal Hierarchical-Cyclic Interactivity Modeling for Video Scene Graphs in Aerial Footage [11.587822611656648]
We introduce the Temporal Hierarchical Cyclic Scene Graph (THYME) approach, which integrates hierarchical feature aggregation with cyclic temporal refinement to address limitations.<n>THYME effectively models multi-scale spatial context and enforces temporal consistency across frames, yielding more accurate and coherent scene graphs.<n>In addition, we present AeroEye-v1.0, a novel aerial video dataset enriched with five types of interactivity that overcomes the constraints of existing datasets.
arXiv Detail & Related papers (2025-07-12T08:43:38Z)
FDSG: Forecasting Dynamic Scene Graphs [41.18167591493808]
We propose a novel framework that predicts future entity labels, bounding boxes, and relationships for unobserved frames.<n>A temporal aggregation module further refines predictions by integrating forecasted and observed information via crossattention.<n>Experiments on Action Genome show that FDSG outperforms state-of-the-art methods on dynamic scene graph generation, scene graph anticipation, and scene graph forecasting.
arXiv Detail & Related papers (2025-06-02T09:46:22Z)
ScaDyG:A New Paradigm for Large-scale Dynamic Graph Learning [31.629956388962814]
ScaDyG is a time-aware scalable learning paradigm for dynamic graph networks. experiments on 12 datasets demonstrate that ScaDyG performs comparably well or even outperforms other SOTA methods in both node and link-level downstream tasks.
arXiv Detail & Related papers (2025-01-27T12:39:16Z)
Understanding Long Videos via LLM-Powered Entity Relation Graphs [51.13422967711056]
GraphVideoAgent is a framework that maps and monitors the evolving relationships between visual entities throughout the video sequence. Our approach demonstrates remarkable effectiveness when tested against industry benchmarks.
arXiv Detail & Related papers (2025-01-27T10:57:24Z)
Temporally Consistent Dynamic Scene Graphs: An End-to-End Approach for Action Tracklet Generation [1.6584112749108326]
TCDSG, Temporally Consistent Dynamic Scene Graphs, is an end-to-end framework that detects, tracks, and links subject-object relationships across time. Our work sets a new standard in multi-frame video analysis, opening new avenues for high-impact applications in surveillance, autonomous navigation, and beyond.
arXiv Detail & Related papers (2024-12-03T20:19:20Z)
Towards Unbiased and Robust Spatio-Temporal Scene Graph Generation and Anticipation [10.678727237318503]
Impar, a novel training framework that leverages curriculum learning and loss masking to mitigate bias generation and anticipation modelling. We introduce two new tasks, Robust Spatio-Temporal Scene Graph Generation and Robust Scene Graph Anticipation, designed to evaluate the robustness of STSG models against distribution shifts.
arXiv Detail & Related papers (2024-11-20T06:15:28Z)
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion [118.74385965694694]
We present Motion DUSt3R (MonST3R), a novel geometry-first approach that directly estimates per-timestep geometry from dynamic scenes. By simply estimating a pointmap for each timestep, we can effectively adapt DUST3R's representation, previously only used for static scenes, to dynamic scenes. We show that by posing the problem as a fine-tuning task, identifying several suitable datasets, and strategically training the model on this limited data, we can surprisingly enable the model to handle dynamics.
arXiv Detail & Related papers (2024-10-04T18:00:07Z)
Retrieval Augmented Generation for Dynamic Graph Modeling [15.09162213134372]
Dynamic graph modeling is crucial for analyzing evolving patterns in various applications. Existing approaches often integrate graph neural networks with temporal modules or redefine dynamic graph modeling as a generative sequence task. We introduce the Retrieval-Augmented Generation for Dynamic Graph Modeling (RAG4DyG) framework, which leverages guidance from contextually and temporally analogous examples.
arXiv Detail & Related papers (2024-08-26T09:23:35Z)
TimeGraphs: Graph-based Temporal Reasoning [64.18083371645956]
TimeGraphs is a novel approach that characterizes dynamic interactions as a hierarchical temporal graph. Our approach models the interactions using a compact graph-based representation, enabling adaptive reasoning across diverse time scales. We evaluate TimeGraphs on multiple datasets with complex, dynamic agent interactions, including a football simulator, the Resistance game, and the MOMA human activity dataset.
arXiv Detail & Related papers (2024-01-06T06:26:49Z)
Local-Global Information Interaction Debiasing for Dynamic Scene Graph Generation [51.92419880088668]
We propose a novel DynSGG model based on multi-task learning, DynSGG-MTL, which introduces the local interaction information and global human-action interaction information. Long-temporal human actions supervise the model to generate multiple scene graphs that conform to the global constraints and avoid the model being unable to learn the tail predicates.
arXiv Detail & Related papers (2023-08-10T01:24:25Z)
EasyDGL: Encode, Train and Interpret for Continuous-time Dynamic Graph Learning [92.71579608528907]
This paper aims to design an easy-to-use pipeline (termed as EasyDGL) composed of three key modules with both strong ability fitting and interpretability. EasyDGL can effectively quantify the predictive power of frequency content that a model learn from the evolving graph data.
arXiv Detail & Related papers (2023-03-22T06:35:08Z)
Time-aware Dynamic Graph Embedding for Asynchronous Structural Evolution [60.695162101159134]
Existing works merely view a dynamic graph as a sequence of changes. We formulate dynamic graphs as temporal edge sequences associated with joining time of. vertex and timespan of edges. A time-aware Transformer is proposed to embed. vertex' dynamic connections and ToEs into the learned. vertex representations.
arXiv Detail & Related papers (2022-07-01T15:32:56Z)
Efficient Dynamic Graph Representation Learning at Scale [66.62859857734104]
We propose Efficient Dynamic Graph lEarning (EDGE), which selectively expresses certain temporal dependency via training loss to improve the parallelism in computations. We show that EDGE can scale to dynamic graphs with millions of nodes and hundreds of millions of temporal events and achieve new state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2021-12-14T22:24:53Z)
Event Detection on Dynamic Graphs [4.128347119808724]
Event detection is a critical task for timely decision-making in graph analytics applications. We propose DyGED, a simple yet novel deep learning model for event detection on dynamic graphs.
arXiv Detail & Related papers (2021-10-23T05:52:03Z)
Spatial-Temporal Transformer for Dynamic Scene Graph Generation [34.190733855032065]
We propose a neural network that consists of two core modules: (1) a spatial encoder that takes an input frame to extract spatial context and reason about the visual relationships within a frame, and (2) a temporal decoder which takes the output of the spatial encoder as input. Our method is validated on the benchmark dataset Action Genome (AG)
arXiv Detail & Related papers (2021-07-26T16:30:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.