Related papers: Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relationship Detection

Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relationship Detection

URL: http://arxiv.org/abs/2403.17709v1
Date: Tue, 26 Mar 2024 13:56:34 GMT
Title: Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relationship Detection
Authors: Jongha Kim, Jihwan Park, Jinyoung Park, Jinyoung Kim, Sehyung Kim, Hyunwoo J. Kim,
Abstract summary: Visual Relationship Detection (VRD) has seen significant advancements with Transformer-based architectures recently. We identify two key limitations in a conventional label assignment for training Transformer-based VRD models. Groupwise Query and Quality-Aware Multi-Assignment (SpeaQ) are proposed to address these issues.
Score: 21.352923995507595
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Visual Relationship Detection (VRD) has seen significant advancements with Transformer-based architectures recently. However, we identify two key limitations in a conventional label assignment for training Transformer-based VRD models, which is a process of mapping a ground-truth (GT) to a prediction. Under the conventional assignment, an unspecialized query is trained since a query is expected to detect every relation, which makes it difficult for a query to specialize in specific relations. Furthermore, a query is also insufficiently trained since a GT is assigned only to a single prediction, therefore near-correct or even correct predictions are suppressed by being assigned no relation as a GT. To address these issues, we propose Groupwise Query Specialization and Quality-Aware Multi-Assignment (SpeaQ). Groupwise Query Specialization trains a specialized query by dividing queries and relations into disjoint groups and directing a query in a specific query group solely toward relations in the corresponding relation group. Quality-Aware Multi-Assignment further facilitates the training by assigning a GT to multiple predictions that are significantly close to a GT in terms of a subject, an object, and the relation in between. Experimental results and analyses show that SpeaQ effectively trains specialized queries, which better utilize the capacity of a model, resulting in consistent performance gains with zero additional inference cost across multiple VRD models and benchmarks. Code is available at https://github.com/mlvlab/SpeaQ.

Related papers

The benefits of query-based KGQA systems for complex and temporal questions in LLM era [55.20230501807337]
Large language models excel in question-answering (QA) yet still struggle with multi-hop reasoning and temporal questions.<n> Query-based knowledge graph QA (KGQA) offers a modular alternative by generating executable queries instead of direct answers.<n>We explore multi-stage query-based framework for WikiData QA, proposing multi-stage approach that enhances performance on challenging multi-hop and temporal benchmarks.
arXiv Detail & Related papers (2025-07-16T06:41:03Z)
MAFA: A multi-agent framework for annotation [0.0]
We introduce a multi-agent framework for annotation that combines multiple specialized agents with different approaches and a judge agent that reranks candidates to produce optimal results.<n>Our framework is particularly effective at handling ambiguous queries, making it well-suited for deployment in production applications.
arXiv Detail & Related papers (2025-05-19T19:16:37Z)
Uncovering the Limitations of Query Performance Prediction: Failures, Insights, and Implications for Selective Query Processing [3.463527836552468]
This paper provides a comprehensive evaluation of state-of-the-art QPPs (e.g. NQC, UQC) We use diverse sparse rankers (BM25, DFree without and with query expansion) and hybrid or dense (SPLADE and ColBert) rankers and diverse test collections ROBUST, GOV2, WT10G, and MS MARCO. Results show significant variability in predictors accuracy, with collections as the main factor and rankers next.
arXiv Detail & Related papers (2025-04-01T18:18:21Z)
MultiConIR: Towards multi-condition Information Retrieval [57.6405602406446]
We introduce MultiConIR, the first benchmark designed to evaluate retrieval models in multi-condition scenarios. We propose three tasks to assess retrieval and reranking models on multi-condition robustness, monotonic relevance ranking, and query format sensitivity.
arXiv Detail & Related papers (2025-03-11T05:02:03Z)
A General Model for Aggregating Annotations Across Simple, Complex, and Multi-Object Annotation Tasks [51.14185612418977]
A strategy to improve label quality is to ask multiple annotators to label the same item and aggregate their labels. While a variety of bespoke models have been proposed for specific tasks, our work is the first to introduce aggregation methods that generalize across many diverse complex tasks. This article extends our prior work with investigation of three new research questions.
arXiv Detail & Related papers (2023-12-20T21:28:35Z)
Query2Triple: Unified Query Encoding for Answering Diverse Complex Queries over Knowledge Graphs [29.863085746761556]
We propose Query to Triple (Q2T), a novel approach that decouples the training for simple and complex queries. Our proposed Q2T is not only efficient to train, but also modular, thus easily adaptable to various neural link predictors.
arXiv Detail & Related papers (2023-10-17T13:13:30Z)
Single-Stage Visual Relationship Learning using Conditional Queries [60.90880759475021]
TraCQ is a new formulation for scene graph generation that avoids the multi-task learning problem and the entity pair distribution. We employ a DETR-based encoder-decoder conditional queries to significantly reduce the entity label space as well. Experimental results show that TraCQ not only outperforms existing single-stage scene graph generation methods, it also beats many state-of-the-art two-stage methods on the Visual Genome dataset.
arXiv Detail & Related papers (2023-06-09T06:02:01Z)
Language-aware Multiple Datasets Detection Pretraining for DETRs [4.939595148195813]
We propose a framework for utilizing Multiple datasets to pretrain DETR-like detectors, termed METR. It converts the typical multi-classification in object detection into binary classification by introducing a pre-trained language model. We show METR achieves extraordinary results on either multi-task joint training or the pretrain & finetune paradigm.
arXiv Detail & Related papers (2023-04-07T10:34:04Z)
Team DETR: Guide Queries as a Professional Team in Detection Transformers [31.521916994653235]
We propose Team DETR, which leverages query collaboration and position constraints to embrace objects of interest more precisely. We also dynamically cater to each query member's prediction preference, offering the query better scale and spatial priors. In addition, the proposed Team DETR is flexible enough to be adapted to other existing DETR variants without increasing parameters and calculations.
arXiv Detail & Related papers (2023-02-14T15:21:53Z)
UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question Answering Over Knowledge Graph [89.98762327725112]
Multi-hop Question Answering over Knowledge Graph(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question. We propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning.
arXiv Detail & Related papers (2022-12-02T04:08:09Z)
Relation-Guided Pre-Training for Open-Domain Question Answering [67.86958978322188]
We propose a Relation-Guided Pre-Training (RGPT-QA) framework to solve complex open-domain questions. We show that RGPT-QA achieves 2.2%, 2.4%, and 6.3% absolute improvement in Exact Match accuracy on Natural Questions, TriviaQA, and WebQuestions.
arXiv Detail & Related papers (2021-09-21T17:59:31Z)
Effective FAQ Retrieval and Question Matching With Unsupervised Knowledge Injection [10.82418428209551]
We propose a contextual language model for retrieving appropriate answers to frequently asked questions. We also explore to capitalize on domain-specific topically-relevant relations between words in an unsupervised manner. We evaluate variants of our approach on a publicly-available Chinese FAQ dataset, and further apply and contextualize it to a large-scale question-matching task.
arXiv Detail & Related papers (2020-10-27T05:03:34Z)
Query Resolution for Conversational Search with Limited Supervision [63.131221660019776]
We propose QuReTeC (Query Resolution by Term Classification), a neural query resolution model based on bidirectional transformers. We show that QuReTeC outperforms state-of-the-art models, and furthermore, that our distant supervision method can be used to substantially reduce the amount of human-curated data required to train QuReTeC.
arXiv Detail & Related papers (2020-05-24T11:37:22Z)
Query Focused Multi-Document Summarization with Distant Supervision [88.39032981994535]
Existing work relies heavily on retrieval-style methods for estimating the relevance between queries and text segments. We propose a coarse-to-fine modeling framework which introduces separate modules for estimating whether segments are relevant to the query. We demonstrate that our framework outperforms strong comparison systems on standard QFS benchmarks.
arXiv Detail & Related papers (2020-04-06T22:35:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.