THYME: Temporal Hierarchical-Cyclic Interactivity Modeling for Video Scene Graphs in Aerial Footage
- URL: http://arxiv.org/abs/2507.09200v1
- Date: Sat, 12 Jul 2025 08:43:38 GMT
- Title: THYME: Temporal Hierarchical-Cyclic Interactivity Modeling for Video Scene Graphs in Aerial Footage
- Authors: Trong-Thuan Nguyen, Pha Nguyen, Jackson Cothren, Alper Yilmaz, Minh-Triet Tran, Khoa Luu,
- Abstract summary: We introduce the Temporal Hierarchical Cyclic Scene Graph (THYME) approach, which integrates hierarchical feature aggregation with cyclic temporal refinement to address limitations.<n>THYME effectively models multi-scale spatial context and enforces temporal consistency across frames, yielding more accurate and coherent scene graphs.<n>In addition, we present AeroEye-v1.0, a novel aerial video dataset enriched with five types of interactivity that overcomes the constraints of existing datasets.
- Score: 11.587822611656648
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid proliferation of video in applications such as autonomous driving, surveillance, and sports analytics necessitates robust methods for dynamic scene understanding. Despite advances in static scene graph generation and early attempts at video scene graph generation, previous methods often suffer from fragmented representations, failing to capture fine-grained spatial details and long-range temporal dependencies simultaneously. To address these limitations, we introduce the Temporal Hierarchical Cyclic Scene Graph (THYME) approach, which synergistically integrates hierarchical feature aggregation with cyclic temporal refinement to address these limitations. In particular, THYME effectively models multi-scale spatial context and enforces temporal consistency across frames, yielding more accurate and coherent scene graphs. In addition, we present AeroEye-v1.0, a novel aerial video dataset enriched with five types of interactivity that overcome the constraints of existing datasets and provide a comprehensive benchmark for dynamic scene graph generation. Empirically, extensive experiments on ASPIRe and AeroEye-v1.0 demonstrate that the proposed THYME approach outperforms state-of-the-art methods, offering improved scene understanding in ground-view and aerial scenarios.
Related papers
- Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better [61.381599921020175]
Temporal consistency is critical in video prediction to ensure that outputs are coherent and free of artifacts.<n>Traditional methods, such as temporal attention and 3D convolution, may struggle with significant object motion.<n>We propose the Tracktention Layer, a novel architectural component that explicitly integrates motion information using point tracks.
arXiv Detail & Related papers (2025-03-25T17:58:48Z) - Salient Temporal Encoding for Dynamic Scene Graph Generation [0.765514655133894]
We propose a novel spatial-temporal scene graph generation method that selectively builds temporal connections only between temporal-relevant objects pairs.<n>The resulting sparse and explicit temporal representation allows us to improve upon strong scene graph generation baselines by up to $4.4%$ in Scene Graph Detection.
arXiv Detail & Related papers (2025-03-15T08:01:36Z) - Understanding Long Videos via LLM-Powered Entity Relation Graphs [51.13422967711056]
GraphVideoAgent is a framework that maps and monitors the evolving relationships between visual entities throughout the video sequence.<n>Our approach demonstrates remarkable effectiveness when tested against industry benchmarks.
arXiv Detail & Related papers (2025-01-27T10:57:24Z) - Temporally Consistent Dynamic Scene Graphs: An End-to-End Approach for Action Tracklet Generation [1.6584112749108326]
TCDSG, Temporally Consistent Dynamic Scene Graphs, is an end-to-end framework that detects, tracks, and links subject-object relationships across time.<n>Our work sets a new standard in multi-frame video analysis, opening new avenues for high-impact applications in surveillance, autonomous navigation, and beyond.
arXiv Detail & Related papers (2024-12-03T20:19:20Z) - CYCLO: Cyclic Graph Transformer Approach to Multi-Object Relationship Modeling in Aerial Videos [9.807247838436489]
We introduce the new AeroEye dataset that focuses on multi-object relationship modeling in aerial videos.
We propose the novel Cyclic Graph Transformer (CYCLO) approach that allows the model to capture both direct and long-range temporal dependencies.
The proposed approach also allows one to handle sequences with inherent cyclical patterns and process object relationships in the correct sequential order.
arXiv Detail & Related papers (2024-06-03T06:24:55Z) - TimeGraphs: Graph-based Temporal Reasoning [64.18083371645956]
TimeGraphs is a novel approach that characterizes dynamic interactions as a hierarchical temporal graph.
Our approach models the interactions using a compact graph-based representation, enabling adaptive reasoning across diverse time scales.
We evaluate TimeGraphs on multiple datasets with complex, dynamic agent interactions, including a football simulator, the Resistance game, and the MOMA human activity dataset.
arXiv Detail & Related papers (2024-01-06T06:26:49Z) - Local-Global Information Interaction Debiasing for Dynamic Scene Graph
Generation [51.92419880088668]
We propose a novel DynSGG model based on multi-task learning, DynSGG-MTL, which introduces the local interaction information and global human-action interaction information.
Long-temporal human actions supervise the model to generate multiple scene graphs that conform to the global constraints and avoid the model being unable to learn the tail predicates.
arXiv Detail & Related papers (2023-08-10T01:24:25Z) - Constructing Holistic Spatio-Temporal Scene Graph for Video Semantic
Role Labeling [96.64607294592062]
Video Semantic Label Roleing (VidSRL) aims to detect salient events from given videos.
Recent endeavors have put forth methods for VidSRL, but they can be subject to two key drawbacks.
arXiv Detail & Related papers (2023-08-09T17:20:14Z) - Time-aware Dynamic Graph Embedding for Asynchronous Structural Evolution [60.695162101159134]
Existing works merely view a dynamic graph as a sequence of changes.
We formulate dynamic graphs as temporal edge sequences associated with joining time of.
vertex and timespan of edges.
A time-aware Transformer is proposed to embed.
vertex' dynamic connections and ToEs into the learned.
vertex representations.
arXiv Detail & Related papers (2022-07-01T15:32:56Z) - Exploiting Long-Term Dependencies for Generating Dynamic Scene Graphs [15.614710220461353]
We show that capturing long-term dependencies is the key to effective generation of dynamic scene graphs.
Experimental results demonstrate that our Dynamic Scene Graph Detection Transformer (DSG-DETR) outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-12-18T03:02:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.