Dual-branch Hybrid Learning Network for Unbiased Scene Graph Generation
- URL: http://arxiv.org/abs/2207.07913v1
- Date: Sat, 16 Jul 2022 11:53:50 GMT
- Title: Dual-branch Hybrid Learning Network for Unbiased Scene Graph Generation
- Authors: Chaofan Zheng, Lianli Gao, Xinyu Lyu, Pengpeng Zeng, Abdulmotaleb El
Saddik, Heng Tao Shen
- Abstract summary: We propose a Dual-branch Hybrid Learning network (DHL) to take care of both head predicates and tail ones for Scene Graph Generation (SGG)
We show that our approach achieves a new state-of-the-art performance on VG and GQA datasets.
- Score: 87.13847750383778
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The current studies of Scene Graph Generation (SGG) focus on solving the
long-tailed problem for generating unbiased scene graphs. However, most
de-biasing methods overemphasize the tail predicates and underestimate head
ones throughout training, thereby wrecking the representation ability of head
predicate features. Furthermore, these impaired features from head predicates
harm the learning of tail predicates. In fact, the inference of tail predicates
heavily depends on the general patterns learned from head ones, e.g., "standing
on" depends on "on". Thus, these de-biasing SGG methods can neither achieve
excellent performance on tail predicates nor satisfying behaviors on head ones.
To address this issue, we propose a Dual-branch Hybrid Learning network (DHL)
to take care of both head predicates and tail ones for SGG, including a
Coarse-grained Learning Branch (CLB) and a Fine-grained Learning Branch (FLB).
Specifically, the CLB is responsible for learning expertise and robust features
of head predicates, while the FLB is expected to predict informative tail
predicates. Furthermore, DHL is equipped with a Branch Curriculum Schedule
(BCS) to make the two branches work well together. Experiments show that our
approach achieves a new state-of-the-art performance on VG and GQA datasets and
makes a trade-off between the performance of tail predicates and head ones.
Moreover, extensive experiments on two downstream tasks (i.e., Image Captioning
and Sentence-to-Graph Retrieval) further verify the generalization and
practicability of our method.
Related papers
- Head-Tail Cooperative Learning Network for Unbiased Scene Graph
Generation [30.467562472064177]
Current unbiased Scene Graph Generation (SGG) methods ignore the substantial sacrifice in the prediction of head predicates.
We propose a model-agnostic Head-Tail Collaborative Learning network that includes head-prefer and tail-prefer feature representation branches.
Our method achieves higher mean Recall with a minimal sacrifice in Recall and achieves a new state-of-the-art overall performance.
arXiv Detail & Related papers (2023-08-23T10:29:25Z) - Vision Relation Transformer for Unbiased Scene Graph Generation [31.29954125135073]
Current Scene Graph Generation (SGG) methods suffer from an information loss regarding the entities local-level cues during the relation encoding process.
We introduce the Vision rElation TransfOrmer (VETO), consisting of a novel local-level entity relation encoder.
We show that VETO + MEET boosts the predictive performance by up to 47 percentage over the state of the art while being 10 times smaller.
arXiv Detail & Related papers (2023-08-18T11:15:31Z) - Feature Fusion from Head to Tail for Long-Tailed Visual Recognition [39.86973663532936]
The biased decision boundary caused by inadequate semantic information in tail classes is one of the key factors contributing to their low recognition accuracy.
We propose to augment tail classes by grafting the diverse semantic information from head classes, referred to as head-to-tail fusion (H2T)
Both theoretical analysis and practical experimentation demonstrate that H2T can contribute to a more optimized solution for the decision boundary.
arXiv Detail & Related papers (2023-06-12T08:50:46Z) - Adaptive Fine-Grained Predicates Learning for Scene Graph Generation [122.4588401267544]
General Scene Graph Generation (SGG) models tend to predict head predicates and re-balancing strategies prefer tail categories.
We propose an Adaptive Fine-Grained Predicates Learning (FGPL-A) which aims at differentiating hard-to-distinguish predicates for SGG.
Our proposed model-agnostic strategy significantly boosts performance of benchmark models on VG-SGG and GQA-SGG datasets by up to 175% and 76% on Mean Recall@100, achieving new state-of-the-art performance.
arXiv Detail & Related papers (2022-07-11T03:37:57Z) - Learning To Generate Scene Graph from Head to Tail [65.48134724633472]
We propose a novel SGG framework, learning to generate scene graphs from Head to Tail (SGG-HT)
CRM learns head/easy samples firstly for robust features of head predicates and then gradually focuses on tail/hard ones.
SCM is proposed to relieve semantic deviation by ensuring the semantic consistency between the generated scene graph and the ground truth in global and local representations.
arXiv Detail & Related papers (2022-06-23T12:16:44Z) - Fine-Grained Predicates Learning for Scene Graph Generation [155.48614435437355]
Fine-Grained Predicates Learning aims at differentiating among hard-to-distinguish predicates for Scene Graph Generation task.
We introduce a Predicate Lattice that helps SGG models to figure out fine-grained predicate pairs.
We then propose a Category Discriminating Loss and an Entity Discriminating Loss, which both contribute to distinguishing fine-grained predicates.
arXiv Detail & Related papers (2022-04-06T06:20:09Z) - Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased
Scene Graph Generation [62.96628432641806]
Scene Graph Generation aims to first encode the visual contents within the given image and then parse them into a compact summary graph.
We first present a novel Stacked Hybrid-Attention network, which facilitates the intra-modal refinement as well as the inter-modal interaction.
We then devise an innovative Group Collaborative Learning strategy to optimize the decoder.
arXiv Detail & Related papers (2022-03-18T09:14:13Z) - PCPL: Predicate-Correlation Perception Learning for Unbiased Scene Graph
Generation [58.98802062945709]
We propose a novel Predicate-Correlation Perception Learning scheme to adaptively seek out appropriate loss weights.
Our PCPL framework is further equipped with a graph encoder module to better extract context features.
arXiv Detail & Related papers (2020-09-02T08:30:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.