Related papers: PQPP: A Joint Benchmark for Text-to-Image Prompt and Query Performance Prediction

PQPP: A Joint Benchmark for Text-to-Image Prompt and Query Performance Prediction

URL: http://arxiv.org/abs/2406.04746v1
Date: Fri, 7 Jun 2024 08:46:19 GMT
Title: PQPP: A Joint Benchmark for Text-to-Image Prompt and Query Performance Prediction
Authors: Eduard Poesina, Adriana Valentina Costache, Adrian-Gabriel Chifu, Josiane Mothe, Radu Tudor Ionescu,
Abstract summary: We propose the first benchmark for joint text-to-image prompt and query performance prediction, comprising 10K queries. We present results with several pre-generation/retrieval and post-generation/retrieval performance predictors, thus providing competitive baselines for future research.
Score: 21.5792639312506
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text-to-image generation has recently emerged as a viable alternative to text-to-image retrieval, due to the visually impressive results of generative diffusion models. Although query performance prediction is an active research topic in information retrieval, to the best of our knowledge, there is no prior study that analyzes the difficulty of queries (prompts) in text-to-image generation, based on human judgments. To this end, we introduce the first dataset of prompts which are manually annotated in terms of image generation performance. In order to determine the difficulty of the same prompts in image retrieval, we also collect manual annotations that represent retrieval performance. We thus propose the first benchmark for joint text-to-image prompt and query performance prediction, comprising 10K queries. Our benchmark enables: (i) the comparative assessment of the difficulty of prompts/queries in image generation and image retrieval, and (ii) the evaluation of prompt/query performance predictors addressing both generation and retrieval. We present results with several pre-generation/retrieval and post-generation/retrieval performance predictors, thus providing competitive baselines for future research. Our benchmark and code is publicly available under the CC BY 4.0 license at https://github.com/Eduard6421/PQPP.

Related papers

VQPP: Video Query Performance Prediction Benchmark [22.214338497366082]
We propose the first benchmark for video query performance prediction (VQPP)<n>VQPP contains a total of 56K text queries and 51K videos, and comes with official training, validation and test splits.<n>We explore multiple pre-retrieval and post-retrieval performance predictors, creating a representative benchmark for future exploration of QPP in the video domain.
arXiv Detail & Related papers (2026-02-19T20:32:25Z)
BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval [54.54576644403115]
We introduce BRIGHT, the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents. Our dataset consists of 1,384 real-world queries spanning diverse domains, such as economics, psychology, mathematics, and coding. We show that incorporating explicit reasoning about the query improves retrieval performance by up to 12.2 points.
arXiv Detail & Related papers (2024-07-16T17:58:27Z)
Unified Text-to-Image Generation and Retrieval [96.72318842152148]
We propose a unified framework in the context of Multimodal Large Language Models (MLLMs) We first explore the intrinsic discrimi abilities of MLLMs and introduce a generative retrieval method to perform retrieval in a training-free manner. We then unify generation and retrieval in an autoregressive generation way and propose an autonomous decision module to choose the best-matched one between generated and retrieved images.
arXiv Detail & Related papers (2024-06-09T15:00:28Z)
Adapting Dual-encoder Vision-language Models for Paraphrased Retrieval [55.90407811819347]
We consider the task of paraphrased text-to-image retrieval where a model aims to return similar results given a pair of paraphrased queries. We train a dual-encoder model starting from a language model pretrained on a large text corpus. Compared to public dual-encoder models such as CLIP and OpenCLIP, the model trained with our best adaptation strategy achieves a significantly higher ranking similarity for paraphrased queries.
arXiv Detail & Related papers (2024-05-06T06:30:17Z)
Text-based Person Search without Parallel Image-Text Data [52.63433741872629]
Text-based person search (TBPS) aims to retrieve the images of the target person from a large image gallery based on a given natural language description. Existing methods are dominated by training models with parallel image-text pairs, which are very costly to collect. In this paper, we make the first attempt to explore TBPS without parallel image-text data.
arXiv Detail & Related papers (2023-05-22T12:13:08Z)
Efficient Image-Text Retrieval via Keyword-Guided Pre-Screening [53.1711708318581]
Current image-text retrieval methods suffer from $N$-related time complexity. This paper presents a simple and effective keyword-guided pre-screening framework for the image-text retrieval.
arXiv Detail & Related papers (2023-03-14T09:36:42Z)
iQPP: A Benchmark for Image Query Performance Prediction [24.573869540845124]
We propose the first benchmark for image query performance prediction (iQPP) We estimate the ground-truth difficulty of each query as the average precision or the precision@k, using two state-of-the-art image retrieval models. Next, we propose and evaluate novel pre-retrieval and post-retrieval query performance predictors, comparing them with existing or adapted (from text to image) predictors. Our comprehensive experiments indicate that iQPP is a challenging benchmark, revealing an important research gap that needs to be addressed in future work.
arXiv Detail & Related papers (2023-02-20T17:56:57Z)
Progressive Learning for Image Retrieval with Hybrid-Modality Queries [48.79599320198615]
Image retrieval with hybrid-modality queries, also known as composing text and image for image retrieval (CTI-IR) We decompose the CTI-IR task into a three-stage learning problem to progressively learn the complex knowledge for image retrieval with hybrid-modality queries. Our proposed model significantly outperforms state-of-the-art methods in the mean of Recall@K by 24.9% and 9.5% on the Fashion-IQ and Shoes benchmark datasets respectively.
arXiv Detail & Related papers (2022-04-24T08:10:06Z)
Where Does the Performance Improvement Come From? - A Reproducibility Concern about Image-Text Retrieval [85.03655458677295]
Image-text retrieval has gradually become a major research direction in the field of information retrieval. We first examine the related concerns and why the focus is on image-text retrieval tasks. We analyze various aspects of the reproduction of pretrained and nonpretrained retrieval models.
arXiv Detail & Related papers (2022-03-08T05:01:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.