QuRe: Query-Relevant Retrieval through Hard Negative Sampling in Composed Image Retrieval
- URL: http://arxiv.org/abs/2507.12416v1
- Date: Wed, 16 Jul 2025 17:06:33 GMT
- Title: QuRe: Query-Relevant Retrieval through Hard Negative Sampling in Composed Image Retrieval
- Authors: Jaehyun Kwak, Ramahdani Muhammad Izaaz Inhar, Se-Young Yun, Sung-Ju Lee,
- Abstract summary: Composed Image Retrieval (CIR) retrieves relevant images based on a reference image and accompanying text describing desired modifications.<n>This limitation arises because most methods employing contrastive learning treat the target image as positive and all other images in the batch as negatives.<n>We propose Query-Relevant Retrieval through Hard Negative Sampling (QuRe), which optimize a reward model objective to reduce false negatives.
- Score: 24.699637275626998
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Composed Image Retrieval (CIR) retrieves relevant images based on a reference image and accompanying text describing desired modifications. However, existing CIR methods only focus on retrieving the target image and disregard the relevance of other images. This limitation arises because most methods employing contrastive learning-which treats the target image as positive and all other images in the batch as negatives-can inadvertently include false negatives. This may result in retrieving irrelevant images, reducing user satisfaction even when the target image is retrieved. To address this issue, we propose Query-Relevant Retrieval through Hard Negative Sampling (QuRe), which optimizes a reward model objective to reduce false negatives. Additionally, we introduce a hard negative sampling strategy that selects images positioned between two steep drops in relevance scores following the target image, to effectively filter false negatives. In order to evaluate CIR models on their alignment with human satisfaction, we create Human-Preference FashionIQ (HP-FashionIQ), a new dataset that explicitly captures user preferences beyond target retrieval. Extensive experiments demonstrate that QuRe achieves state-of-the-art performance on FashionIQ and CIRR datasets while exhibiting the strongest alignment with human preferences on the HP-FashionIQ dataset. The source code is available at https://github.com/jackwaky/QuRe.
Related papers
- Zero Shot Composed Image Retrieval [0.0]
Composed image retrieval (CIR) allows a user to locate a target image by applying a fine-grained textual edit.<n>Zero-shot CIR, which embeds the image and the text with separate pretrained vision-language encoders, reaches only 20-25% Recall@10 on the FashionIQ benchmark.<n>We improve this by fine-tuning BLIP-2 with a lightweight Q-Former that fuses visual and textual features into a single embedding.
arXiv Detail & Related papers (2025-06-07T00:38:43Z) - Multimodal Reasoning Agent for Zero-Shot Composed Image Retrieval [52.709090256954276]
Zero-Shot Composed Image Retrieval (ZS-CIR) aims to retrieve target images given a compositional query.<n>We propose a novel framework by employing a Multimodal Reasoning Agent (MRA) for ZS-CIR.
arXiv Detail & Related papers (2025-05-26T13:17:50Z) - NCL-CIR: Noise-aware Contrastive Learning for Composed Image Retrieval [16.460121977322224]
Composed Image Retrieval (CIR) seeks to find a target image using a multi-modal query, which combines an image with modification text to pinpoint the target.<n> pairs are often partially or completely mismatched due to issues like inaccurate modification texts, low-quality target images, and annotation errors.<n>We propose the Noise-aware Contrastive Learning for CIR (NCL-CIR) comprising two key components: the Weight Compensation Block (WCB) and the Noise-pair Filter Block (NFB).
arXiv Detail & Related papers (2025-04-06T03:27:23Z) - Active Learning for Finely-Categorized Image-Text Retrieval by Selecting Hard Negative Unpaired Samples [7.883521157895832]
Securing sufficient amount of paired data is important to train an image-text retrieval (ITR) model.
We propose an active learning algorithm for ITR that can collect paired data cost-efficiently.
We validate the effectiveness of the proposed method on Flickr30K and MS-COCO datasets.
arXiv Detail & Related papers (2024-05-25T16:50:33Z) - Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives [20.37803751979975]
Composed Image Retrieval (CIR) task aims to retrieve target images using a composed query consisting of a reference image and a modified text.
We propose a data generation method by leveraging a multi-modal large language model to construct triplets for CIR.
Our method effectively scales positives and negatives and achieves state-of-the-art results on both FashionIQ and CIRR datasets.
arXiv Detail & Related papers (2024-04-17T12:30:54Z) - VQA4CIR: Boosting Composed Image Retrieval with Visual Question
Answering [68.47402250389685]
This work provides a Visual Question Answering (VQA) perspective to boost the performance of CIR.
The resulting VQA4CIR is a post-processing approach and can be directly plugged into existing CIR methods.
Experimental results show that our proposed method outperforms state-of-the-art CIR methods on the CIRR and Fashion-IQ datasets.
arXiv Detail & Related papers (2023-12-19T15:56:08Z) - Re-Scoring Using Image-Language Similarity for Few-Shot Object Detection [4.0208298639821525]
Few-shot object detection, which focuses on detecting novel objects with few labels, is an emerging challenge in the community.
Recent studies show that adapting a pre-trained model or modified loss function can improve performance.
We propose Re-scoring using Image-language Similarity for Few-shot object detection (RISF) which extends Faster R-CNN.
arXiv Detail & Related papers (2023-11-01T04:04:34Z) - Sentence-level Prompts Benefit Composed Image Retrieval [69.78119883060006]
Composed image retrieval (CIR) is the task of retrieving specific images by using a query that involves both a reference image and a relative caption.
We propose to leverage pretrained V-L models, e.g., BLIP-2, to generate sentence-level prompts.
Our proposed method performs favorably against the state-of-the-art CIR methods on the Fashion-IQ and CIRR datasets.
arXiv Detail & Related papers (2023-10-09T07:31:44Z) - Learning from History: Task-agnostic Model Contrastive Learning for
Image Restoration [79.04007257606862]
This paper introduces an innovative method termed 'learning from history', which dynamically generates negative samples from the target model itself.
Our approach, named Model Contrastive Learning for Image Restoration (MCLIR), rejuvenates latency models as negative models, making it compatible with diverse image restoration tasks.
arXiv Detail & Related papers (2023-09-12T07:50:54Z) - Your Negative May not Be True Negative: Boosting Image-Text Matching
with False Negative Elimination [62.18768931714238]
We propose a novel False Negative Elimination (FNE) strategy to select negatives via sampling.
The results demonstrate the superiority of our proposed false negative elimination strategy.
arXiv Detail & Related papers (2023-08-08T16:31:43Z) - Inverse Problems Leveraging Pre-trained Contrastive Representations [88.70821497369785]
We study a new family of inverse problems for recovering representations of corrupted data.
We propose a supervised inversion method that uses a contrastive objective to obtain excellent representations for highly corrupted images.
Our method outperforms end-to-end baselines even with a fraction of the labeled data in a wide range of forward operators.
arXiv Detail & Related papers (2021-10-14T15:06:30Z) - Learning Conditional Knowledge Distillation for Degraded-Reference Image
Quality Assessment [157.1292674649519]
We propose a practical solution named degraded-reference IQA (DR-IQA)
DR-IQA exploits the inputs of IR models, degraded images, as references.
Our results can even be close to the performance of full-reference settings.
arXiv Detail & Related papers (2021-08-18T02:35:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.