Towards Scene Graph Anticipation
- URL: http://arxiv.org/abs/2403.04899v2
- Date: Fri, 19 Jul 2024 12:40:28 GMT
- Title: Towards Scene Graph Anticipation
- Authors: Rohith Peddi, Saksham Singh, Saurabh, Parag Singla, Vibhav Gogate,
- Abstract summary: We introduce the task of Scene Graph Anticipation (SGA)
We adapt state-of-the-art scene graph generation methods as baselines to anticipate future pair-wise relationships between objects.
In SceneSayer, we leverage object-centric representations of relationships to reason about the observed video frames and model the evolution of relationships between objects.
- Score: 10.678727237318503
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Spatio-temporal scene graphs represent interactions in a video by decomposing scenes into individual objects and their pair-wise temporal relationships. Long-term anticipation of the fine-grained pair-wise relationships between objects is a challenging problem. To this end, we introduce the task of Scene Graph Anticipation (SGA). We adapt state-of-the-art scene graph generation methods as baselines to anticipate future pair-wise relationships between objects and propose a novel approach SceneSayer. In SceneSayer, we leverage object-centric representations of relationships to reason about the observed video frames and model the evolution of relationships between objects. We take a continuous time perspective and model the latent dynamics of the evolution of object interactions using concepts of NeuralODE and NeuralSDE, respectively. We infer representations of future relationships by solving an Ordinary Differential Equation and a Stochastic Differential Equation, respectively. Extensive experimentation on the Action Genome dataset validates the efficacy of the proposed methods.
Related papers
- Semantic Scene Graph Generation Based on an Edge Dual Scene Graph and
Message Passing Neural Network [3.9280441311534653]
Scene graph generation (SGG) captures the relationships between objects in an image and creates a structured graph-based representation.
Existing SGG methods have a limited ability to accurately predict detailed relationships.
A new approach to the modeling multiobject relationships, called edge dual scene graph generation (EdgeSGG), is proposed herein.
arXiv Detail & Related papers (2023-11-02T12:36:52Z) - Local-Global Information Interaction Debiasing for Dynamic Scene Graph
Generation [51.92419880088668]
We propose a novel DynSGG model based on multi-task learning, DynSGG-MTL, which introduces the local interaction information and global human-action interaction information.
Long-temporal human actions supervise the model to generate multiple scene graphs that conform to the global constraints and avoid the model being unable to learn the tail predicates.
arXiv Detail & Related papers (2023-08-10T01:24:25Z) - Cross-Modality Time-Variant Relation Learning for Generating Dynamic
Scene Graphs [16.760066844287046]
We propose a Time-variant Relation-aware TRansformer (TR$2$) to model the temporal change of relations in dynamic scene graphs.
We show that TR$2$ significantly outperforms previous state-of-the-art methods under two different settings.
arXiv Detail & Related papers (2023-05-15T10:30:38Z) - Spatio-Temporal Joint Graph Convolutional Networks for Traffic
Forecasting [75.10017445699532]
Recent have shifted their focus towards formulating traffic forecasting as atemporal graph modeling problem.
We propose a novel approach for accurate traffic forecasting on road networks over multiple future time steps.
arXiv Detail & Related papers (2021-11-25T08:45:14Z) - Spatio-Temporal Interaction Graph Parsing Networks for Human-Object
Interaction Recognition [55.7731053128204]
In given video-based Human-Object Interaction scene, modeling thetemporal relationship between humans and objects are the important cue to understand the contextual information presented in the video.
With the effective-temporal relationship modeling, it is possible not only to uncover contextual information in each frame but also directly capture inter-time dependencies.
The full use of appearance features, spatial location and the semantic information are also the key to improve the video-based Human-Object Interaction recognition performance.
arXiv Detail & Related papers (2021-08-19T11:57:27Z) - Unified Graph Structured Models for Video Understanding [93.72081456202672]
We propose a message passing graph neural network that explicitly models relational-temporal relations.
We show how our method is able to more effectively model relationships between relevant entities in the scene.
arXiv Detail & Related papers (2021-03-29T14:37:35Z) - Learning Temporal Dynamics from Cycles in Narrated Video [85.89096034281694]
We propose a self-supervised solution to the problem of learning to model how the world changes as time elapses.
Our model learns modality-agnostic functions to predict forward and backward in time, which must undo each other when composed.
We apply the learned dynamics model without further training to various tasks, such as predicting future action and temporally ordering sets of images.
arXiv Detail & Related papers (2021-01-07T02:41:32Z) - DRG: Dual Relation Graph for Human-Object Interaction Detection [65.50707710054141]
We tackle the challenging problem of human-object interaction (HOI) detection.
Existing methods either recognize the interaction of each human-object pair in isolation or perform joint inference based on complex appearance-based features.
In this paper, we leverage an abstract spatial-semantic representation to describe each human-object pair and aggregate the contextual information of the scene via a dual relation graph.
arXiv Detail & Related papers (2020-08-26T17:59:40Z) - Jointly Modeling Aspect and Sentiment with Dynamic Heterogeneous Graph
Neural Networks [27.59070337052869]
Target-Based Sentiment Analysis aims to detect the opinion aspects (aspect extraction) and the sentiment polarities (sentiment detection) towards them.
Both the previous pipeline and integrated methods fail to precisely model the innate connection between these two objectives.
We propose a novel dynamic heterogeneous graph to jointly model the two objectives in an explicit way.
arXiv Detail & Related papers (2020-04-14T11:27:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.