ALF: Adaptive Label Finetuning for Scene Graph Generation
- URL: http://arxiv.org/abs/2312.17425v2
- Date: Thu, 23 May 2024 06:54:18 GMT
- Title: ALF: Adaptive Label Finetuning for Scene Graph Generation
- Authors: Qishen Chen, Jianzhi Liu, Xinyu Lyu, Lianli Gao, Heng Tao Shen, Jingkuan Song,
- Abstract summary: Scene Graph Generation endeavors to predict the relationships between subjects and objects in a given image.
Long-tail distribution of relations often leads to biased prediction on coarse labels, presenting a substantial hurdle in SGG.
We introduce one-stage data transfer pipeline in SGG, termed Adaptive Label Finetuning (ALF), which eliminates the need for extra retraining sessions.
ALF achieves a 16% improvement in mR@100 compared to the typical SGG method Motif, with only a 6% increase in calculation costs compared to the state-of-the-art method IETrans.
- Score: 116.59868289196157
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene Graph Generation (SGG) endeavors to predict the relationships between subjects and objects in a given image. Nevertheless, the long-tail distribution of relations often leads to biased prediction on coarse labels, presenting a substantial hurdle in SGG. To address this issue, researchers focus on unbiased SGG and introduce data transfer methods to transfer coarse-grained predicates into fine-grained ones across the entire dataset. However, these methods encounter two primary challenges: 1) They overlook the inherent context constraints imposed by subject-object pairs, leading to erroneous relations transfer. 2) Additional retraining process are required after the data transfer, which incurs substantial computational costs. To overcome these limitations, we introduce the first plug-and-play one-stage data transfer pipeline in SGG, termed Adaptive Label Finetuning (ALF), which eliminates the need for extra retraining sessions and meanwhile significantly enhance models' relation recognition capability across various SGG benchmark approaches. Specifically, ALF consists of two components: Adaptive Label Construction (ALC) and Adaptive Iterative Learning (AIL). By imposing Predicate-Context Constraints within relation space, ALC adaptively re-ranks and selects candidate relations in reference to model's predictive logits utilizing the Restriction-Based Judgment techniques, achieving robust relation transfer. Supervised with labels transferred by ALC, AIL iteratively finetunes the SGG models in an auto-regressive manner, which mitigates the substantial computational costs arising from the retraining process. Extensive experiments demonstrate that ALF achieves a 16% improvement in mR@100 compared to the typical SGG method Motif, with only a 6% increase in calculation costs compared to the state-of-the-art method IETrans.
Related papers
- Autoencoder based approach for the mitigation of spurious correlations [2.7624021966289605]
Spurious correlations refer to erroneous associations in data that do not reflect true underlying relationships.
These correlations can lead deep neural networks (DNNs) to learn patterns that are not robust across diverse datasets or real-world scenarios.
We propose an autoencoder-based approach to analyze the nature of spurious correlations that exist in the Global Wheat Head Detection (GWHD) 2021 dataset.
arXiv Detail & Related papers (2024-06-27T05:28:44Z) - Single-Stage Visual Relationship Learning using Conditional Queries [60.90880759475021]
TraCQ is a new formulation for scene graph generation that avoids the multi-task learning problem and the entity pair distribution.
We employ a DETR-based encoder-decoder conditional queries to significantly reduce the entity label space as well.
Experimental results show that TraCQ not only outperforms existing single-stage scene graph generation methods, it also beats many state-of-the-art two-stage methods on the Visual Genome dataset.
arXiv Detail & Related papers (2023-06-09T06:02:01Z) - DDGHM: Dual Dynamic Graph with Hybrid Metric Training for Cross-Domain
Sequential Recommendation [15.366783212837515]
Sequential Recommendation (SR) characterizes evolving patterns of user behaviors by modeling how users transit among items.
To solve this problem, we focus on Cross-Domain Sequential Recommendation (CDSR)
We propose DDGHM, a novel framework for the CDSR problem, which includes two main modules, dual dynamic graph modeling and hybrid metric training.
arXiv Detail & Related papers (2022-09-21T07:53:06Z) - Adaptive Fine-Grained Predicates Learning for Scene Graph Generation [122.4588401267544]
General Scene Graph Generation (SGG) models tend to predict head predicates and re-balancing strategies prefer tail categories.
We propose an Adaptive Fine-Grained Predicates Learning (FGPL-A) which aims at differentiating hard-to-distinguish predicates for SGG.
Our proposed model-agnostic strategy significantly boosts performance of benchmark models on VG-SGG and GQA-SGG datasets by up to 175% and 76% on Mean Recall@100, achieving new state-of-the-art performance.
arXiv Detail & Related papers (2022-07-11T03:37:57Z) - Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased
Scene Graph Generation [62.96628432641806]
Scene Graph Generation aims to first encode the visual contents within the given image and then parse them into a compact summary graph.
We first present a novel Stacked Hybrid-Attention network, which facilitates the intra-modal refinement as well as the inter-modal interaction.
We then devise an innovative Group Collaborative Learning strategy to optimize the decoder.
arXiv Detail & Related papers (2022-03-18T09:14:13Z) - Not All Relations are Equal: Mining Informative Labels for Scene Graph
Generation [48.21846438269506]
Scene graph generation (SGG) aims to capture a wide variety of interactions between pairs of objects.
Existing SGG methods fail to acquire complex reasoning about visual and textual correlations due to various biases in training data.
We propose a novel framework for SGG training that exploits relation labels based on their informativeness.
arXiv Detail & Related papers (2021-11-26T14:34:12Z) - Semantic Correspondence with Transformers [68.37049687360705]
We propose Cost Aggregation with Transformers (CATs) to find dense correspondences between semantically similar images.
We include appearance affinity modelling to disambiguate the initial correlation maps and multi-level aggregation.
We conduct experiments to demonstrate the effectiveness of the proposed model over the latest methods and provide extensive ablation studies.
arXiv Detail & Related papers (2021-06-04T14:39:03Z) - PCPL: Predicate-Correlation Perception Learning for Unbiased Scene Graph
Generation [58.98802062945709]
We propose a novel Predicate-Correlation Perception Learning scheme to adaptively seek out appropriate loss weights.
Our PCPL framework is further equipped with a graph encoder module to better extract context features.
arXiv Detail & Related papers (2020-09-02T08:30:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.