ALF: Adaptive Label Finetuning for Scene Graph Generation
- URL: http://arxiv.org/abs/2312.17425v3
- Date: Tue, 26 Nov 2024 01:55:07 GMT
- Title: ALF: Adaptive Label Finetuning for Scene Graph Generation
- Authors: Qishen Chen, Jianzhi Liu, Xinyu Lyu, Lianli Gao, Heng Tao Shen, Jingkuan Song,
- Abstract summary: Scene Graph Generation endeavors to predict the relationships between subjects and objects in a given image.
Long-tail distribution of relations often leads to biased prediction on coarse labels, presenting a substantial hurdle in SGG.
We introduce one-stage data transfer pipeline in SGG, termed Adaptive Label Finetuning (ALF), which eliminates the need for extra retraining sessions.
ALF achieves a 16% improvement in mR@100 compared to the typical SGG method Motif, with only a 6% increase in calculation costs compared to the state-of-the-art method IETrans.
- Score: 116.59868289196157
- License:
- Abstract: Scene Graph Generation (SGG) endeavors to predict the relationships between subjects and objects in a given image. Nevertheless, the long-tail distribution of relations often leads to biased prediction on coarse labels, presenting a substantial hurdle in SGG. To address this issue, researchers focus on unbiased SGG and introduce data transfer methods to transfer coarse-grained predicates into fine-grained ones across the entire dataset. However, these methods encounter two primary challenges: 1) They overlook the inherent context constraints imposed by subject-object pairs, leading to erroneous relations transfer. 2) Additional retraining process are required after the data transfer, which incurs substantial computational costs. To overcome these limitations, we introduce the first plug-and-play one-stage data transfer pipeline in SGG, termed Adaptive Label Finetuning (ALF), which eliminates the need for extra retraining sessions and meanwhile significantly enhance models' relation recognition capability across various SGG benchmark approaches. Specifically, ALF consists of two components: Adaptive Label Construction (ALC) and Adaptive Iterative Learning (AIL). By imposing Predicate-Context Constraints within relation space, ALC adaptively re-ranks and selects candidate relations in reference to model's predictive logits utilizing the Restriction-Based Judgment techniques, achieving robust relation transfer. Supervised with labels transferred by ALC, AIL iteratively finetunes the SGG models in an auto-regressive manner, which mitigates the substantial computational costs arising from the retraining process. Extensive experiments demonstrate that ALF achieves a 16% improvement in mR@100 compared to the typical SGG method Motif, with only a 6% increase in calculation costs compared to the state-of-the-art method IETrans.
Related papers
- Progressive Multi-Level Alignments for Semi-Supervised Domain Adaptation SAR Target Recognition Using Simulated Data [3.1951121258423334]
We develop an instance-prototype alignment (AIPA) strategy to push the source domain instances close to the corresponding target prototypes.
We also develop an instance-prototype alignment (AIPA) strategy to push the source domain instances close to the corresponding target prototypes.
arXiv Detail & Related papers (2024-11-07T13:53:13Z) - Fine-Grained Scene Graph Generation via Sample-Level Bias Prediction [12.319354506916547]
We propose a novel Sample-Level Bias Prediction (SBP) method for fine-grained Scene Graph Generation (SGG)
Firstly, we train a classic SGG model and construct a correction bias set.
Then, we devise a Bias-Oriented Generative Adversarial Network (BGAN) that learns to predict the constructed correction biases.
arXiv Detail & Related papers (2024-07-27T13:49:06Z) - Single-Stage Visual Relationship Learning using Conditional Queries [60.90880759475021]
TraCQ is a new formulation for scene graph generation that avoids the multi-task learning problem and the entity pair distribution.
We employ a DETR-based encoder-decoder conditional queries to significantly reduce the entity label space as well.
Experimental results show that TraCQ not only outperforms existing single-stage scene graph generation methods, it also beats many state-of-the-art two-stage methods on the Visual Genome dataset.
arXiv Detail & Related papers (2023-06-09T06:02:01Z) - DDGHM: Dual Dynamic Graph with Hybrid Metric Training for Cross-Domain
Sequential Recommendation [15.366783212837515]
Sequential Recommendation (SR) characterizes evolving patterns of user behaviors by modeling how users transit among items.
To solve this problem, we focus on Cross-Domain Sequential Recommendation (CDSR)
We propose DDGHM, a novel framework for the CDSR problem, which includes two main modules, dual dynamic graph modeling and hybrid metric training.
arXiv Detail & Related papers (2022-09-21T07:53:06Z) - Adaptive Fine-Grained Predicates Learning for Scene Graph Generation [122.4588401267544]
General Scene Graph Generation (SGG) models tend to predict head predicates and re-balancing strategies prefer tail categories.
We propose an Adaptive Fine-Grained Predicates Learning (FGPL-A) which aims at differentiating hard-to-distinguish predicates for SGG.
Our proposed model-agnostic strategy significantly boosts performance of benchmark models on VG-SGG and GQA-SGG datasets by up to 175% and 76% on Mean Recall@100, achieving new state-of-the-art performance.
arXiv Detail & Related papers (2022-07-11T03:37:57Z) - Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased
Scene Graph Generation [62.96628432641806]
Scene Graph Generation aims to first encode the visual contents within the given image and then parse them into a compact summary graph.
We first present a novel Stacked Hybrid-Attention network, which facilitates the intra-modal refinement as well as the inter-modal interaction.
We then devise an innovative Group Collaborative Learning strategy to optimize the decoder.
arXiv Detail & Related papers (2022-03-18T09:14:13Z) - Semantic Correspondence with Transformers [68.37049687360705]
We propose Cost Aggregation with Transformers (CATs) to find dense correspondences between semantically similar images.
We include appearance affinity modelling to disambiguate the initial correlation maps and multi-level aggregation.
We conduct experiments to demonstrate the effectiveness of the proposed model over the latest methods and provide extensive ablation studies.
arXiv Detail & Related papers (2021-06-04T14:39:03Z) - PCPL: Predicate-Correlation Perception Learning for Unbiased Scene Graph
Generation [58.98802062945709]
We propose a novel Predicate-Correlation Perception Learning scheme to adaptively seek out appropriate loss weights.
Our PCPL framework is further equipped with a graph encoder module to better extract context features.
arXiv Detail & Related papers (2020-09-02T08:30:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.