FreestyleRet: Retrieving Images from Style-Diversified Queries
- URL: http://arxiv.org/abs/2312.02428v2
- Date: Fri, 8 Dec 2023 14:30:37 GMT
- Title: FreestyleRet: Retrieving Images from Style-Diversified Queries
- Authors: Hao Li, Curise Jia, Peng Jin, Zesen Cheng, Kehan Li, Jialu Sui, Chang
Liu, Li Yuan
- Abstract summary: Style-Diversified Query-Based Image Retrieval task enables retrieval based on various query styles.
We propose the first Diverse-Style Retrieval dataset, encompassing diverse query styles including text, sketch, low-resolution, and art.
Our model, employing the style-init prompt tuning strategy, outperforms existing retrieval models on the style-diversified retrieval task.
- Score: 17.253021422951928
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image Retrieval aims to retrieve corresponding images based on a given query.
In application scenarios, users intend to express their retrieval intent
through various query styles. However, current retrieval tasks predominantly
focus on text-query retrieval exploration, leading to limited retrieval query
options and potential ambiguity or bias in user intention. In this paper, we
propose the Style-Diversified Query-Based Image Retrieval task, which enables
retrieval based on various query styles. To facilitate the novel setting, we
propose the first Diverse-Style Retrieval dataset, encompassing diverse query
styles including text, sketch, low-resolution, and art. We also propose a
light-weighted style-diversified retrieval framework. For various query style
inputs, we apply the Gram Matrix to extract the query's textural features and
cluster them into a style space with style-specific bases. Then we employ the
style-init prompt tuning module to enable the visual encoder to comprehend the
texture and style information of the query. Experiments demonstrate that our
model, employing the style-init prompt tuning strategy, outperforms existing
retrieval models on the style-diversified retrieval task. Moreover,
style-diversified queries~(sketch+text, art+text, etc) can be simultaneously
retrieved in our model. The auxiliary information from other queries enhances
the retrieval performance within the respective query.
Related papers
- Uni-Retrieval: A Multi-Style Retrieval Framework for STEM's Education [30.071212702797016]
In AI-facilitated teaching, leveraging various query styles to interpret abstract text descriptions is crucial for ensuring high-quality teaching.
In this paper, we propose a diverse expression retrieval task tailored to educational scenarios, supporting retrieval based on multiple query styles and expressions.
We introduce the STEM Education Retrieval dataset (SER), which contains over 24,000 query pairs of different styles, and the Uni-Retrieval, an efficient and style-diversified retrieval vision-language model based on prompt tuning.
arXiv Detail & Related papers (2025-02-09T11:46:05Z) - Maybe you are looking for CroQS: Cross-modal Query Suggestion for Text-to-Image Retrieval [15.757140563856675]
This work introduces a novel task that focuses on suggesting minimal textual modifications needed to explore visually consistent subsets of the collection.
To facilitate the evaluation and development of methods, we present a tailored benchmark named CroQS.
Baseline methods from related fields, such as image captioning and content summarization, are adapted for this task to provide reference performance scores.
arXiv Detail & Related papers (2024-12-18T13:24:09Z) - Query-oriented Data Augmentation for Session Search [71.84678750612754]
We propose query-oriented data augmentation to enrich search logs and empower the modeling.
We generate supplemental training pairs by altering the most important part of a search context.
We develop several strategies to alter the current query, resulting in new training data with varying degrees of difficulty.
arXiv Detail & Related papers (2024-07-04T08:08:33Z) - Measuring Style Similarity in Diffusion Models [118.22433042873136]
We present a framework for understanding and extracting style descriptors from images.
Our framework comprises a new dataset curated using the insight that style is a subjective property of an image.
We also propose a method to extract style attribute descriptors that can be used to style of a generated image to the images used in the training dataset of a text-to-image model.
arXiv Detail & Related papers (2024-04-01T17:58:30Z) - End-to-end Knowledge Retrieval with Multi-modal Queries [50.01264794081951]
ReMuQ requires a system to retrieve knowledge from a large corpus by integrating contents from both text and image queries.
We introduce a retriever model ReViz'' that can directly process input text and images to retrieve relevant knowledge in an end-to-end fashion.
We demonstrate superior performance in retrieval on two datasets under zero-shot settings.
arXiv Detail & Related papers (2023-06-01T08:04:12Z) - Decomposing Complex Queries for Tip-of-the-tongue Retrieval [72.07449449115167]
Complex queries describe content elements (e.g., book characters or events), information beyond the document text.
This retrieval setting, called tip of the tongue (TOT), is especially challenging for models reliant on lexical and semantic overlap between query and document text.
We introduce a simple yet effective framework for handling such complex queries by decomposing the query into individual clues, routing those as sub-queries to specialized retrievers, and ensembling the results.
arXiv Detail & Related papers (2023-05-24T11:43:40Z) - Progressive Learning for Image Retrieval with Hybrid-Modality Queries [48.79599320198615]
Image retrieval with hybrid-modality queries, also known as composing text and image for image retrieval (CTI-IR)
We decompose the CTI-IR task into a three-stage learning problem to progressively learn the complex knowledge for image retrieval with hybrid-modality queries.
Our proposed model significantly outperforms state-of-the-art methods in the mean of Recall@K by 24.9% and 9.5% on the Fashion-IQ and Shoes benchmark datasets respectively.
arXiv Detail & Related papers (2022-04-24T08:10:06Z) - Probabilistic Compositional Embeddings for Multimodal Image Retrieval [48.450232527041436]
We investigate a more challenging scenario for composing multiple multimodal queries in image retrieval.
Given an arbitrary number of query images and (or) texts, our goal is to retrieve target images containing the semantic concepts specified in multiple multimodal queries.
We propose a novel multimodal probabilistic composer (MPC) to learn an informative embedding that can flexibly encode the semantics of various queries.
arXiv Detail & Related papers (2022-04-12T14:45:37Z) - ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and
Implicit Similarity [16.550790981646276]
Current approaches combine the features of each of the two elements of the query into a single representation.
Our work aims at shedding new light on the task by looking at it through the prism of two familiar and related frameworks: text-to-image and image-to-image retrieval.
arXiv Detail & Related papers (2022-03-15T17:29:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.