Learning To Generate Scene Graph from Head to Tail
- URL: http://arxiv.org/abs/2206.11653v1
- Date: Thu, 23 Jun 2022 12:16:44 GMT
- Title: Learning To Generate Scene Graph from Head to Tail
- Authors: Chaofan Zheng, Xinyu Lyu, Yuyu Guo, Pengpeng Zeng, Jingkuan Song,
Lianli Gao
- Abstract summary: We propose a novel SGG framework, learning to generate scene graphs from Head to Tail (SGG-HT)
CRM learns head/easy samples firstly for robust features of head predicates and then gradually focuses on tail/hard ones.
SCM is proposed to relieve semantic deviation by ensuring the semantic consistency between the generated scene graph and the ground truth in global and local representations.
- Score: 65.48134724633472
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene Graph Generation (SGG) represents objects and their interactions with a
graph structure. Recently, many works are devoted to solving the imbalanced
problem in SGG. However, underestimating the head predicates in the whole
training process, they wreck the features of head predicates that provide
general features for tail ones. Besides, assigning excessive attention to the
tail predicates leads to semantic deviation. Based on this, we propose a novel
SGG framework, learning to generate scene graphs from Head to Tail (SGG-HT),
containing Curriculum Re-weight Mechanism (CRM) and Semantic Context Module
(SCM). CRM learns head/easy samples firstly for robust features of head
predicates and then gradually focuses on tail/hard ones. SCM is proposed to
relieve semantic deviation by ensuring the semantic consistency between the
generated scene graph and the ground truth in global and local representations.
Experiments show that SGG-HT significantly alleviates the biased problem and
chieves state-of-the-art performances on Visual Genome.
Related papers
- HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation [13.929906773382752]
A common approach enabling the ability to reason over visual data is Scene Graph Generation (SGG)
We propose a novel SGG benchmark containing procedurally generated weather corruptions and other transformations over the Visual Genome dataset.
We show that HiKER-SGG does not only demonstrate superior performance on corrupted images in a zero-shot manner, but also outperforms current state-of-the-art methods on uncorrupted SGG tasks.
arXiv Detail & Related papers (2024-03-18T17:59:10Z) - GPT4SGG: Synthesizing Scene Graphs from Holistic and Region-specific Narratives [69.36723767339001]
We propose a novel framework named textitGPT4SGG to obtain more accurate and comprehensive scene graph signals.
We show textitGPT4SGG significantly improves the performance of SGG models trained on image-caption data.
arXiv Detail & Related papers (2023-12-07T14:11:00Z) - Head-Tail Cooperative Learning Network for Unbiased Scene Graph
Generation [30.467562472064177]
Current unbiased Scene Graph Generation (SGG) methods ignore the substantial sacrifice in the prediction of head predicates.
We propose a model-agnostic Head-Tail Collaborative Learning network that includes head-prefer and tail-prefer feature representation branches.
Our method achieves higher mean Recall with a minimal sacrifice in Recall and achieves a new state-of-the-art overall performance.
arXiv Detail & Related papers (2023-08-23T10:29:25Z) - Vision Relation Transformer for Unbiased Scene Graph Generation [31.29954125135073]
Current Scene Graph Generation (SGG) methods suffer from an information loss regarding the entities local-level cues during the relation encoding process.
We introduce the Vision rElation TransfOrmer (VETO), consisting of a novel local-level entity relation encoder.
We show that VETO + MEET boosts the predictive performance by up to 47 percentage over the state of the art while being 10 times smaller.
arXiv Detail & Related papers (2023-08-18T11:15:31Z) - Visually-Prompted Language Model for Fine-Grained Scene Graph Generation
in an Open World [67.03968403301143]
Scene Graph Generation (SGG) aims to extract subject, predicate, object> relationships in images for vision understanding.
Existing re-balancing strategies try to handle it via prior rules but are still confined to pre-defined conditions.
We propose a Cross-modal prediCate boosting (CaCao) framework, where a visually-prompted language model is learned to generate diverse fine-grained predicates.
arXiv Detail & Related papers (2023-03-23T13:06:38Z) - CAME: Context-aware Mixture-of-Experts for Unbiased Scene Graph
Generation [10.724516317292926]
We present a simple yet effective method called Context-Aware Mixture-of-Experts (CAME) to improve the model diversity and alleviate the biased scene graph generator.
We have conducted extensive experiments on three tasks on the Visual Genome dataset to show that came achieved superior performance over previous methods.
arXiv Detail & Related papers (2022-08-15T10:39:55Z) - Dual-branch Hybrid Learning Network for Unbiased Scene Graph Generation [87.13847750383778]
We propose a Dual-branch Hybrid Learning network (DHL) to take care of both head predicates and tail ones for Scene Graph Generation (SGG)
We show that our approach achieves a new state-of-the-art performance on VG and GQA datasets.
arXiv Detail & Related papers (2022-07-16T11:53:50Z) - From General to Specific: Informative Scene Graph Generation via Balance
Adjustment [113.04103371481067]
Current models are stuck in common predicates, e.g., "on" and "at", rather than informative ones.
We propose BA-SGG, a framework based on balance adjustment but not the conventional distribution fitting.
Our method achieves 14.3%, 8.0%, and 6.1% higher Mean Recall (mR) than that of the Transformer model at three scene graph generation sub-tasks on Visual Genome.
arXiv Detail & Related papers (2021-08-30T11:39:43Z) - PCPL: Predicate-Correlation Perception Learning for Unbiased Scene Graph
Generation [58.98802062945709]
We propose a novel Predicate-Correlation Perception Learning scheme to adaptively seek out appropriate loss weights.
Our PCPL framework is further equipped with a graph encoder module to better extract context features.
arXiv Detail & Related papers (2020-09-02T08:30:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.