Vision Relation Transformer for Unbiased Scene Graph Generation
- URL: http://arxiv.org/abs/2308.09472v1
- Date: Fri, 18 Aug 2023 11:15:31 GMT
- Title: Vision Relation Transformer for Unbiased Scene Graph Generation
- Authors: Gopika Sudhakaran, Devendra Singh Dhami, Kristian Kersting, Stefan
Roth
- Abstract summary: Current Scene Graph Generation (SGG) methods suffer from an information loss regarding the entities local-level cues during the relation encoding process.
We introduce the Vision rElation TransfOrmer (VETO), consisting of a novel local-level entity relation encoder.
We show that VETO + MEET boosts the predictive performance by up to 47 percentage over the state of the art while being 10 times smaller.
- Score: 31.29954125135073
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent years have seen a growing interest in Scene Graph Generation (SGG), a
comprehensive visual scene understanding task that aims to predict entity
relationships using a relation encoder-decoder pipeline stacked on top of an
object encoder-decoder backbone. Unfortunately, current SGG methods suffer from
an information loss regarding the entities local-level cues during the relation
encoding process. To mitigate this, we introduce the Vision rElation
TransfOrmer (VETO), consisting of a novel local-level entity relation encoder.
We further observe that many existing SGG methods claim to be unbiased, but are
still biased towards either head or tail classes. To overcome this bias, we
introduce a Mutually Exclusive ExperT (MEET) learning strategy that captures
important relation features without bias towards head or tail classes.
Experimental results on the VG and GQA datasets demonstrate that VETO + MEET
boosts the predictive performance by up to 47 percentage over the state of the
art while being 10 times smaller.
Related papers
- FloCoDe: Unbiased Dynamic Scene Graph Generation with Temporal Consistency and Correlation Debiasing [14.50214193838818]
FloCoDe: Flow-aware Temporal and Correlation Debiasing with uncertainty attenuation for unbiased dynamic scene graphs.
We propose correlation debiasing and a correlation-based loss to learn unbiased relation representations for long-tailed classes.
arXiv Detail & Related papers (2023-10-24T14:59:51Z) - Head-Tail Cooperative Learning Network for Unbiased Scene Graph
Generation [30.467562472064177]
Current unbiased Scene Graph Generation (SGG) methods ignore the substantial sacrifice in the prediction of head predicates.
We propose a model-agnostic Head-Tail Collaborative Learning network that includes head-prefer and tail-prefer feature representation branches.
Our method achieves higher mean Recall with a minimal sacrifice in Recall and achieves a new state-of-the-art overall performance.
arXiv Detail & Related papers (2023-08-23T10:29:25Z) - Dual-branch Hybrid Learning Network for Unbiased Scene Graph Generation [87.13847750383778]
We propose a Dual-branch Hybrid Learning network (DHL) to take care of both head predicates and tail ones for Scene Graph Generation (SGG)
We show that our approach achieves a new state-of-the-art performance on VG and GQA datasets.
arXiv Detail & Related papers (2022-07-16T11:53:50Z) - Learning To Generate Scene Graph from Head to Tail [65.48134724633472]
We propose a novel SGG framework, learning to generate scene graphs from Head to Tail (SGG-HT)
CRM learns head/easy samples firstly for robust features of head predicates and then gradually focuses on tail/hard ones.
SCM is proposed to relieve semantic deviation by ensuring the semantic consistency between the generated scene graph and the ground truth in global and local representations.
arXiv Detail & Related papers (2022-06-23T12:16:44Z) - HL-Net: Heterophily Learning Network for Scene Graph Generation [90.2766568914452]
We propose a novel Heterophily Learning Network (HL-Net) to explore the homophily and heterophily between objects/relationships in scene graphs.
HL-Net comprises the following 1) an adaptive reweighting transformer module, which adaptively integrates the information from different layers to exploit both the heterophily and homophily in objects.
We conducted extensive experiments on two public datasets: Visual Genome (VG) and Open Images (OI)
arXiv Detail & Related papers (2022-05-03T06:00:29Z) - Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased
Scene Graph Generation [62.96628432641806]
Scene Graph Generation aims to first encode the visual contents within the given image and then parse them into a compact summary graph.
We first present a novel Stacked Hybrid-Attention network, which facilitates the intra-modal refinement as well as the inter-modal interaction.
We then devise an innovative Group Collaborative Learning strategy to optimize the decoder.
arXiv Detail & Related papers (2022-03-18T09:14:13Z) - CogTree: Cognition Tree Loss for Unbiased Scene Graph Generation [23.55530043171931]
Scene Graph Generation (SGG) is unsatisfactory when faced with biased data in real-world scenarios.
We propose a novel debiasing Cognition Tree (CogTree) loss for unbiased SGG.
The loss is model-agnostic and consistently boosting the performance of several state-of-the-art models.
arXiv Detail & Related papers (2020-09-16T07:47:26Z) - PCPL: Predicate-Correlation Perception Learning for Unbiased Scene Graph
Generation [58.98802062945709]
We propose a novel Predicate-Correlation Perception Learning scheme to adaptively seek out appropriate loss weights.
Our PCPL framework is further equipped with a graph encoder module to better extract context features.
arXiv Detail & Related papers (2020-09-02T08:30:09Z) - What Makes for Good Views for Contrastive Learning? [90.49736973404046]
We argue that we should reduce the mutual information (MI) between views while keeping task-relevant information intact.
We devise unsupervised and semi-supervised frameworks that learn effective views by aiming to reduce their MI.
As a by-product, we achieve a new state-of-the-art accuracy on unsupervised pre-training for ImageNet classification.
arXiv Detail & Related papers (2020-05-20T17:59:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.