Head-Tail Cooperative Learning Network for Unbiased Scene Graph
Generation
- URL: http://arxiv.org/abs/2308.12048v1
- Date: Wed, 23 Aug 2023 10:29:25 GMT
- Title: Head-Tail Cooperative Learning Network for Unbiased Scene Graph
Generation
- Authors: Lei Wang, Zejian Yuan, Yao Lu, Badong Chen
- Abstract summary: Current unbiased Scene Graph Generation (SGG) methods ignore the substantial sacrifice in the prediction of head predicates.
We propose a model-agnostic Head-Tail Collaborative Learning network that includes head-prefer and tail-prefer feature representation branches.
Our method achieves higher mean Recall with a minimal sacrifice in Recall and achieves a new state-of-the-art overall performance.
- Score: 30.467562472064177
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene Graph Generation (SGG) as a critical task in image understanding,
facing the challenge of head-biased prediction caused by the long-tail
distribution of predicates. However, current unbiased SGG methods can easily
prioritize improving the prediction of tail predicates while ignoring the
substantial sacrifice in the prediction of head predicates, leading to a shift
from head bias to tail bias. To address this issue, we propose a model-agnostic
Head-Tail Collaborative Learning (HTCL) network that includes head-prefer and
tail-prefer feature representation branches that collaborate to achieve
accurate recognition of both head and tail predicates. We also propose a
self-supervised learning approach to enhance the prediction ability of the
tail-prefer feature representation branch by constraining tail-prefer predicate
features. Specifically, self-supervised learning converges head predicate
features to their class centers while dispersing tail predicate features as
much as possible through contrast learning and head center loss. We demonstrate
the effectiveness of our HTCL by applying it to various SGG models on VG150,
Open Images V6 and GQA200 datasets. The results show that our method achieves
higher mean Recall with a minimal sacrifice in Recall and achieves a new
state-of-the-art overall performance. Our code is available at
https://github.com/wanglei0618/HTCL.
Related papers
- Vision Relation Transformer for Unbiased Scene Graph Generation [31.29954125135073]
Current Scene Graph Generation (SGG) methods suffer from an information loss regarding the entities local-level cues during the relation encoding process.
We introduce the Vision rElation TransfOrmer (VETO), consisting of a novel local-level entity relation encoder.
We show that VETO + MEET boosts the predictive performance by up to 47 percentage over the state of the art while being 10 times smaller.
arXiv Detail & Related papers (2023-08-18T11:15:31Z) - Feature Fusion from Head to Tail for Long-Tailed Visual Recognition [39.86973663532936]
The biased decision boundary caused by inadequate semantic information in tail classes is one of the key factors contributing to their low recognition accuracy.
We propose to augment tail classes by grafting the diverse semantic information from head classes, referred to as head-to-tail fusion (H2T)
Both theoretical analysis and practical experimentation demonstrate that H2T can contribute to a more optimized solution for the decision boundary.
arXiv Detail & Related papers (2023-06-12T08:50:46Z) - Constructing Balance from Imbalance for Long-tailed Image Recognition [50.6210415377178]
The imbalance between majority (head) classes and minority (tail) classes severely skews the data-driven deep neural networks.
Previous methods tackle with data imbalance from the viewpoints of data distribution, feature space, and model design.
We propose a concise paradigm by progressively adjusting label space and dividing the head classes and tail classes.
Our proposed model also provides a feature evaluation method and paves the way for long-tailed feature learning.
arXiv Detail & Related papers (2022-08-04T10:22:24Z) - Dual-branch Hybrid Learning Network for Unbiased Scene Graph Generation [87.13847750383778]
We propose a Dual-branch Hybrid Learning network (DHL) to take care of both head predicates and tail ones for Scene Graph Generation (SGG)
We show that our approach achieves a new state-of-the-art performance on VG and GQA datasets.
arXiv Detail & Related papers (2022-07-16T11:53:50Z) - Adaptive Fine-Grained Predicates Learning for Scene Graph Generation [122.4588401267544]
General Scene Graph Generation (SGG) models tend to predict head predicates and re-balancing strategies prefer tail categories.
We propose an Adaptive Fine-Grained Predicates Learning (FGPL-A) which aims at differentiating hard-to-distinguish predicates for SGG.
Our proposed model-agnostic strategy significantly boosts performance of benchmark models on VG-SGG and GQA-SGG datasets by up to 175% and 76% on Mean Recall@100, achieving new state-of-the-art performance.
arXiv Detail & Related papers (2022-07-11T03:37:57Z) - Learning To Generate Scene Graph from Head to Tail [65.48134724633472]
We propose a novel SGG framework, learning to generate scene graphs from Head to Tail (SGG-HT)
CRM learns head/easy samples firstly for robust features of head predicates and then gradually focuses on tail/hard ones.
SCM is proposed to relieve semantic deviation by ensuring the semantic consistency between the generated scene graph and the ground truth in global and local representations.
arXiv Detail & Related papers (2022-06-23T12:16:44Z) - Calibrating Class Activation Maps for Long-Tailed Visual Recognition [60.77124328049557]
We present two effective modifications of CNNs to improve network learning from long-tailed distribution.
First, we present a Class Activation Map (CAMC) module to improve the learning and prediction of network classifiers.
Second, we investigate the use of normalized classifiers for representation learning in long-tailed problems.
arXiv Detail & Related papers (2021-08-29T05:45:03Z) - PCPL: Predicate-Correlation Perception Learning for Unbiased Scene Graph
Generation [58.98802062945709]
We propose a novel Predicate-Correlation Perception Learning scheme to adaptively seek out appropriate loss weights.
Our PCPL framework is further equipped with a graph encoder module to better extract context features.
arXiv Detail & Related papers (2020-09-02T08:30:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.