Related papers: ALF: Adaptive Label Finetuning for Scene Graph Generation

ALF: Adaptive Label Finetuning for Scene Graph Generation

URL: http://arxiv.org/abs/2312.17425v2
Date: Thu, 23 May 2024 06:54:18 GMT
Title: ALF: Adaptive Label Finetuning for Scene Graph Generation
Authors: Qishen Chen, Jianzhi Liu, Xinyu Lyu, Lianli Gao, Heng Tao Shen, Jingkuan Song,
Abstract summary: Scene Graph Generation endeavors to predict the relationships between subjects and objects in a given image. Long-tail distribution of relations often leads to biased prediction on coarse labels, presenting a substantial hurdle in SGG. We introduce one-stage data transfer pipeline in SGG, termed Adaptive Label Finetuning (ALF), which eliminates the need for extra retraining sessions. ALF achieves a 16% improvement in mR@100 compared to the typical SGG method Motif, with only a 6% increase in calculation costs compared to the state-of-the-art method IETrans.
Score: 116.59868289196157
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scene Graph Generation (SGG) endeavors to predict the relationships between subjects and objects in a given image. Nevertheless, the long-tail distribution of relations often leads to biased prediction on coarse labels, presenting a substantial hurdle in SGG. To address this issue, researchers focus on unbiased SGG and introduce data transfer methods to transfer coarse-grained predicates into fine-grained ones across the entire dataset. However, these methods encounter two primary challenges: 1) They overlook the inherent context constraints imposed by subject-object pairs, leading to erroneous relations transfer. 2) Additional retraining process are required after the data transfer, which incurs substantial computational costs. To overcome these limitations, we introduce the first plug-and-play one-stage data transfer pipeline in SGG, termed Adaptive Label Finetuning (ALF), which eliminates the need for extra retraining sessions and meanwhile significantly enhance models' relation recognition capability across various SGG benchmark approaches. Specifically, ALF consists of two components: Adaptive Label Construction (ALC) and Adaptive Iterative Learning (AIL). By imposing Predicate-Context Constraints within relation space, ALC adaptively re-ranks and selects candidate relations in reference to model's predictive logits utilizing the Restriction-Based Judgment techniques, achieving robust relation transfer. Supervised with labels transferred by ALC, AIL iteratively finetunes the SGG models in an auto-regressive manner, which mitigates the substantial computational costs arising from the retraining process. Extensive experiments demonstrate that ALF achieves a 16% improvement in mR@100 compared to the typical SGG method Motif, with only a 6% increase in calculation costs compared to the state-of-the-art method IETrans.

Related papers

A Scalable Pretraining Framework for Link Prediction with Efficient Adaptation [16.82426251068573]
Link Prediction (LP) is a critical task in graph machine learning.<n>Existing methods face key challenges including limited supervision from sparse connectivity.<n>We explore pretraining as a solution to address these challenges.
arXiv Detail & Related papers (2025-08-06T17:10:31Z)
TransDF: Time-Series Forecasting Needs Transformed Label Alignment [53.33409515800757]
We propose Transform-enhanced Direct Forecast (TransDF), which transforms the label sequence into decorrelated components with discriminated significance.<n>Models are trained to align the most significant components, thereby effectively mitigating label autocorrelation and reducing task amount.
arXiv Detail & Related papers (2025-05-23T13:00:35Z)
Asymmetric Co-Training for Source-Free Few-Shot Domain Adaptation [5.611768906855499]
We propose an asymmetric co-training (ACT) method specifically designed for the SFFSDA scenario. We use a two-step optimization process to train the target model. Our findings suggest that adapting a source pre-trained model using only a small amount of labeled target data offers a practical and dependable solution.
arXiv Detail & Related papers (2025-02-20T02:58:45Z)
RA-SGG: Retrieval-Augmented Scene Graph Generation Framework via Multi-Prototype Learning [24.52282123604646]
Scene Graph Generation (SGG) research has suffered from two fundamental challenges: the long-tailed predicate distribution and semantic ambiguity between predicates. We propose Retrieval-Augmented Scene Graph Generation (RA-SGG), which identifies potential instances to be multi-labeled and enriches the single-label with multi-labels that are semantically similar to the original label. RA-SGG effectively alleviates the issue of biased prediction caused by the long-tailed distribution and semantic ambiguity of predicates.
arXiv Detail & Related papers (2024-12-17T10:47:13Z)
Progressive Multi-Level Alignments for Semi-Supervised Domain Adaptation SAR Target Recognition Using Simulated Data [3.1951121258423334]
We develop an instance-prototype alignment (AIPA) strategy to push the source domain instances close to the corresponding target prototypes. We also develop an instance-prototype alignment (AIPA) strategy to push the source domain instances close to the corresponding target prototypes.
arXiv Detail & Related papers (2024-11-07T13:53:13Z)
Fine-Grained Scene Graph Generation via Sample-Level Bias Prediction [12.319354506916547]
We propose a novel Sample-Level Bias Prediction (SBP) method for fine-grained Scene Graph Generation (SGG) Firstly, we train a classic SGG model and construct a correction bias set. Then, we devise a Bias-Oriented Generative Adversarial Network (BGAN) that learns to predict the constructed correction biases.
arXiv Detail & Related papers (2024-07-27T13:49:06Z)
Single-Stage Visual Relationship Learning using Conditional Queries [60.90880759475021]
TraCQ is a new formulation for scene graph generation that avoids the multi-task learning problem and the entity pair distribution. We employ a DETR-based encoder-decoder conditional queries to significantly reduce the entity label space as well. Experimental results show that TraCQ not only outperforms existing single-stage scene graph generation methods, it also beats many state-of-the-art two-stage methods on the Visual Genome dataset.
arXiv Detail & Related papers (2023-06-09T06:02:01Z)
DDGHM: Dual Dynamic Graph with Hybrid Metric Training for Cross-Domain Sequential Recommendation [15.366783212837515]
Sequential Recommendation (SR) characterizes evolving patterns of user behaviors by modeling how users transit among items. To solve this problem, we focus on Cross-Domain Sequential Recommendation (CDSR) We propose DDGHM, a novel framework for the CDSR problem, which includes two main modules, dual dynamic graph modeling and hybrid metric training.
arXiv Detail & Related papers (2022-09-21T07:53:06Z)
Adaptive Fine-Grained Predicates Learning for Scene Graph Generation [122.4588401267544]
General Scene Graph Generation (SGG) models tend to predict head predicates and re-balancing strategies prefer tail categories. We propose an Adaptive Fine-Grained Predicates Learning (FGPL-A) which aims at differentiating hard-to-distinguish predicates for SGG. Our proposed model-agnostic strategy significantly boosts performance of benchmark models on VG-SGG and GQA-SGG datasets by up to 175% and 76% on Mean Recall@100, achieving new state-of-the-art performance.
arXiv Detail & Related papers (2022-07-11T03:37:57Z)
Fine-Grained Scene Graph Generation with Data Transfer [127.17675443137064]
Scene graph generation (SGG) aims to extract (subject, predicate, object) triplets in images. Recent works have made a steady progress on SGG, and provide useful tools for high-level vision and language understanding. We propose a novel Internal and External Data Transfer (IETrans) method, which can be applied in a play-and-plug fashion and expanded to large SGG with 1,807 predicate classes.
arXiv Detail & Related papers (2022-03-22T12:26:56Z)
Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation [62.96628432641806]
Scene Graph Generation aims to first encode the visual contents within the given image and then parse them into a compact summary graph. We first present a novel Stacked Hybrid-Attention network, which facilitates the intra-modal refinement as well as the inter-modal interaction. We then devise an innovative Group Collaborative Learning strategy to optimize the decoder.
arXiv Detail & Related papers (2022-03-18T09:14:13Z)
Semantic Correspondence with Transformers [68.37049687360705]
We propose Cost Aggregation with Transformers (CATs) to find dense correspondences between semantically similar images. We include appearance affinity modelling to disambiguate the initial correlation maps and multi-level aggregation. We conduct experiments to demonstrate the effectiveness of the proposed model over the latest methods and provide extensive ablation studies.
arXiv Detail & Related papers (2021-06-04T14:39:03Z)
PCPL: Predicate-Correlation Perception Learning for Unbiased Scene Graph Generation [58.98802062945709]
We propose a novel Predicate-Correlation Perception Learning scheme to adaptively seek out appropriate loss weights. Our PCPL framework is further equipped with a graph encoder module to better extract context features.
arXiv Detail & Related papers (2020-09-02T08:30:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.