FloCoDe: Unbiased Dynamic Scene Graph Generation with Temporal Consistency and Correlation Debiasing
- URL: http://arxiv.org/abs/2310.16073v3
- Date: Fri, 12 Apr 2024 17:04:15 GMT
- Title: FloCoDe: Unbiased Dynamic Scene Graph Generation with Temporal Consistency and Correlation Debiasing
- Authors: Anant Khandelwal,
- Abstract summary: FloCoDe: Flow-aware Temporal and Correlation Debiasing with uncertainty attenuation for unbiased dynamic scene graphs.
We propose correlation debiasing and a correlation-based loss to learn unbiased relation representations for long-tailed classes.
- Score: 14.50214193838818
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dynamic scene graph generation (SGG) from videos requires not only a comprehensive understanding of objects across scenes but also a method to capture the temporal motions and interactions with different objects. Moreover, the long-tailed distribution of visual relationships is a crucial bottleneck for most dynamic SGG methods. This is because many of them focus on capturing spatio-temporal context using complex architectures, leading to the generation of biased scene graphs. To address these challenges, we propose FloCoDe: Flow-aware Temporal Consistency and Correlation Debiasing with uncertainty attenuation for unbiased dynamic scene graphs. FloCoDe employs feature warping using flow to detect temporally consistent objects across frames. To address the long-tail issue of visual relationships, we propose correlation debiasing and a label correlation-based loss to learn unbiased relation representations for long-tailed classes. Specifically, we propose to incorporate label correlations using contrastive loss to capture commonly co-occurring relations, which aids in learning robust representations for long-tailed classes. Further, we adopt the uncertainty attenuation-based classifier framework to handle noisy annotations in the SGG data. Extensive experimental evaluation shows a performance gain as high as 4.1%, demonstrating the superiority of generating more unbiased scene graphs.
Related papers
- TD^2-Net: Toward Denoising and Debiasing for Dynamic Scene Graph
Generation [76.24766055944554]
We introduce a network named TD$2$-Net that aims at denoising and debiasing for dynamic SGG.
TD$2$-Net outperforms the second-best competitors by 12.7 % on mean-Recall@10 for predicate classification.
arXiv Detail & Related papers (2024-01-23T04:17:42Z) - TimeGraphs: Graph-based Temporal Reasoning [64.18083371645956]
TimeGraphs is a novel approach that characterizes dynamic interactions as a hierarchical temporal graph.
Our approach models the interactions using a compact graph-based representation, enabling adaptive reasoning across diverse time scales.
We evaluate TimeGraphs on multiple datasets with complex, dynamic agent interactions, including a football simulator, the Resistance game, and the MOMA human activity dataset.
arXiv Detail & Related papers (2024-01-06T06:26:49Z) - Temporal Smoothness Regularisers for Neural Link Predictors [8.975480841443272]
We show that a simple method like TNTComplEx can produce significantly more accurate results than state-of-the-art methods.
We also evaluate the impact of a wide range of temporal smoothing regularisers on two state-of-the-art temporal link prediction models.
arXiv Detail & Related papers (2023-09-16T16:52:49Z) - Vision Relation Transformer for Unbiased Scene Graph Generation [31.29954125135073]
Current Scene Graph Generation (SGG) methods suffer from an information loss regarding the entities local-level cues during the relation encoding process.
We introduce the Vision rElation TransfOrmer (VETO), consisting of a novel local-level entity relation encoder.
We show that VETO + MEET boosts the predictive performance by up to 47 percentage over the state of the art while being 10 times smaller.
arXiv Detail & Related papers (2023-08-18T11:15:31Z) - Local-Global Information Interaction Debiasing for Dynamic Scene Graph
Generation [51.92419880088668]
We propose a novel DynSGG model based on multi-task learning, DynSGG-MTL, which introduces the local interaction information and global human-action interaction information.
Long-temporal human actions supervise the model to generate multiple scene graphs that conform to the global constraints and avoid the model being unable to learn the tail predicates.
arXiv Detail & Related papers (2023-08-10T01:24:25Z) - Unbiased Scene Graph Generation in Videos [36.889659781604564]
We introduce TEMPURA: TEmporal consistency and Memory-guided UnceRtainty Attenuation for unbiased dynamic SGG.
TEMPURA employs object-level temporal consistencies via transformer sequence modeling, learns to synthesize unbiased relationship representations.
Our method achieves significant (up to 10% in some cases) performance gain over existing methods.
arXiv Detail & Related papers (2023-04-03T06:10:06Z) - CAME: Context-aware Mixture-of-Experts for Unbiased Scene Graph
Generation [10.724516317292926]
We present a simple yet effective method called Context-Aware Mixture-of-Experts (CAME) to improve the model diversity and alleviate the biased scene graph generator.
We have conducted extensive experiments on three tasks on the Visual Genome dataset to show that came achieved superior performance over previous methods.
arXiv Detail & Related papers (2022-08-15T10:39:55Z) - Hyper-relationship Learning Network for Scene Graph Generation [95.6796681398668]
We propose a hyper-relationship learning network, termed HLN, for scene graph generation.
We evaluate HLN on the most popular SGG dataset, i.e., the Visual Genome dataset.
For example, the proposed HLN improves the recall per relationship from 11.3% to 13.1%, and maintains the recall per image from 19.8% to 34.9%.
arXiv Detail & Related papers (2022-02-15T09:26:16Z) - Recovering the Unbiased Scene Graphs from the Biased Ones [99.24441932582195]
We show that due to the missing labels, scene graph generation (SGG) can be viewed as a "Learning from Positive and Unlabeled data" (PU learning) problem.
We propose Dynamic Label Frequency Estimation (DLFE) to take advantage of training-time data augmentation and average over multiple training iterations to introduce more valid examples.
Extensive experiments show that DLFE is more effective in estimating label frequencies than a naive variant of the traditional estimate, and DLFE significantly alleviates the long tail.
arXiv Detail & Related papers (2021-07-05T16:10:41Z) - Addressing Class Imbalance in Scene Graph Parsing by Learning to
Contrast and Score [65.18522219013786]
Scene graph parsing aims to detect objects in an image scene and recognize their relations.
Recent approaches have achieved high average scores on some popular benchmarks, but fail in detecting rare relations.
This paper introduces a novel integrated framework of classification and ranking to resolve the class imbalance problem.
arXiv Detail & Related papers (2020-09-28T13:57:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.